
<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />

    <title>第三章 索引 &#8212; Joyful Pandas 1.0 documentation</title>
<script>
  document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
  document.documentElement.dataset.theme = localStorage.getItem("theme") || "light"
</script>

  <!-- Loaded before other Sphinx assets -->
  <link href="../_static/styles/theme.css?digest=92025949c220c2e29695" rel="stylesheet">
<link href="../_static/styles/pydata-sphinx-theme.css?digest=92025949c220c2e29695" rel="stylesheet">


  <link rel="stylesheet"
    href="../_static/vendor/fontawesome/5.13.0/css/all.min.css">
  <link rel="preload" as="font" type="font/woff2" crossorigin
    href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-solid-900.woff2">
  <link rel="preload" as="font" type="font/woff2" crossorigin
    href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-brands-400.woff2">

    <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="../_static/plot_directive.css" />
    <link rel="stylesheet" type="text/css" href="../_static/css/s4defs-roles.css" />

  <!-- Pre-loaded scripts that we'll load fully later -->
  <link rel="preload" as="script" href="../_static/scripts/pydata-sphinx-theme.js?digest=92025949c220c2e29695">

    <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
    <script src="../_static/jquery.js"></script>
    <script src="../_static/underscore.js"></script>
    <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="../_static/doctools.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="第四章 分组" href="ch4.html" />
    <link rel="prev" title="第二章 pandas基础" href="ch2.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
  </head>
  
  
  <body data-spy="scroll" data-target="#bd-toc-nav" data-offset="180" data-default-mode="">
    <div class="bd-header-announcement container-fluid" id="banner">
      

    </div>

    
    <nav class="bd-header navbar navbar-light navbar-expand-lg bg-light fixed-top bd-navbar" id="navbar-main"><div class="bd-header__inner container-xl">

  <div id="navbar-start">
    
    
  


<a class="navbar-brand logo" href="../index.html">
  
  
  
  
    <img src="../_static/finallogo1.svg" class="logo__image only-light" alt="Logo image">
    <img src="../_static/finallogo1.svg" class="logo__image only-dark" alt="Logo image">
  
  
</a>
    
  </div>

  <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbar-collapsible" aria-controls="navbar-collapsible" aria-expanded="false" aria-label="Toggle navigation">
    <span class="fas fa-bars"></span>
  </button>

  
  <div id="navbar-collapsible" class="col-lg-9 collapse navbar-collapse">
    <div id="navbar-center" class="mr-auto">
      
      <div class="navbar-center-item">
        <ul id="navbar-main-elements" class="navbar-nav">
    <li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="../Home.html">
  Home
 </a>
</li>

<li class="toctree-l1 current active nav-item">
 <a class="reference internal nav-link" href="index.html">
  Content
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="../Author.html">
  Author
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="../Datawhale.html">
  Datawhale
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="../pandas%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86%E4%B8%8E%E5%88%86%E6%9E%90.html">
  pandas数据处理与分析
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="../%E8%A1%A5%E5%85%85%E4%B9%A0%E9%A2%98.html">
  补充习题
 </a>
</li>

    
    <li class="nav-item">
        <a class="nav-link nav-external" href="https://pandas.pydata.org/docs/index.html">Doc<i class="fas fa-external-link-alt"></i></a>
    </li>
    
</ul>
      </div>
      
    </div>

    <div id="navbar-end">
      
      <div class="navbar-end-item">
        <span id="theme-switch" class="btn btn-sm btn-outline-primary navbar-btn rounded-circle">
    <a class="theme-switch" data-mode="light"><i class="fas fa-sun"></i></a>
    <a class="theme-switch" data-mode="dark"><i class="far fa-moon"></i></a>
    <a class="theme-switch" data-mode="auto"><i class="fas fa-adjust"></i></a>
</span>
      </div>
      
      <div class="navbar-end-item">
        <ul id="navbar-icon-links" class="navbar-nav" aria-label="Icon Links">
        <li class="nav-item">
          <a class="nav-link" href="https://github.com/datawhalechina/joyful-pandas" rel="noopener" target="_blank" title="GitHub"><span><i class="fab fa-github-square"></i></span>
            <label class="sr-only">GitHub</label></a>
        </li>
      </ul>
      </div>
      
    </div>
  </div>
</div>
    </nav>
    

    <div class="bd-container container-xl">
      <div class="bd-container__inner row">
          

<!-- Only show if we have sidebars configured, else just a small margin  -->
<div class="bd-sidebar-primary col-12 col-md-3 bd-sidebar">
  <div class="sidebar-start-items"><form class="bd-search d-flex align-items-center" action="../search.html" method="get">
  <i class="icon fas fa-search"></i>
  <input type="search" class="form-control" name="q" id="search-input" placeholder="Search the docs ..." aria-label="Search the docs ..." autocomplete="off" >
</form><nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
  <div class="bd-toc-item active">
    <ul class="current nav bd-sidenav">
 <li class="toctree-l1">
  <a class="reference internal" href="ch1.html">
   第一章 预备知识
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch2.html">
   第二章 pandas基础
  </a>
 </li>
 <li class="toctree-l1 current active">
  <a class="current reference internal" href="#">
   第三章 索引
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch4.html">
   第四章 分组
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch5.html">
   第五章 变形
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch6.html">
   第六章 连接
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch7.html">
   第七章 缺失数据
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch8.html">
   第八章 文本数据
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch9.html">
   第九章 分类数据
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="ch10.html">
   第十章 时序数据
  </a>
 </li>
 <li class="toctree-l1">
  <a class="reference internal" href="%E5%8F%82%E8%80%83%E7%AD%94%E6%A1%88.html">
   参考答案
  </a>
 </li>
</ul>

  </div>
</nav>
  </div>
  <div class="sidebar-end-items">
  </div>
</div>


          


<div class="bd-sidebar-secondary d-none d-xl-block col-xl-2 bd-toc">
  
    
    <div class="toc-item">
      
<div class="tocsection onthispage mt-5 pt-1 pb-3">
    <i class="fas fa-list"></i> On this page
</div>

<nav id="bd-toc-nav">
    <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id2">
   一、索引器
  </a>
  <ul class="nav section-nav flex-column">
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id3">
     1. 表的列索引
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id4">
     2. 序列的行索引
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#loc">
     3. loc索引器
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#iloc">
     4. iloc索引器
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#query">
     5. query方法
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id5">
     6. 随机抽样
    </a>
   </li>
  </ul>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id6">
   二、多级索引
  </a>
  <ul class="nav section-nav flex-column">
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id7">
     1. 多级索引及其表的结构
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id8">
     2. 多级索引中的loc索引器
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#indexslice">
     3. IndexSlice对象
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id9">
     4. 多级索引的构造
    </a>
   </li>
  </ul>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id10">
   三、索引的常用方法
  </a>
  <ul class="nav section-nav flex-column">
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id11">
     1. 索引层的交换和删除
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id12">
     2. 索引属性的修改
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id13">
     3. 索引的设置与重置
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id14">
     4. 索引的变形
    </a>
   </li>
  </ul>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id15">
   四、索引运算
  </a>
  <ul class="nav section-nav flex-column">
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id16">
     1. 集合的运算法则
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id17">
     2. 一般的索引运算
    </a>
   </li>
  </ul>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id18">
   五、练习
  </a>
  <ul class="nav section-nav flex-column">
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#ex1">
     Ex1：公司员工数据集
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#ex2">
     Ex2：巧克力数据集
    </a>
   </li>
  </ul>
 </li>
</ul>

</nav>
    </div>
    
    <div class="toc-item">
      
    </div>
    
  
</div>


          
          
          <div class="bd-content col-12 col-md-9 col-xl-7">
              
              <article class="bd-article" role="main">
                
  <section id="id1">
<h1>第三章 索引<a class="headerlink" href="#id1" title="Permalink to this heading">#</a></h1>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>

<span class="gp">In [2]: </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
</pre></div>
</div>
<section id="id2">
<h2>一、索引器<a class="headerlink" href="#id2" title="Permalink to this heading">#</a></h2>
<section id="id3">
<h3>1. 表的列索引<a class="headerlink" href="#id3" title="Permalink to this heading">#</a></h3>
<p>列索引是最常见的索引形式，一般通过 <code class="docutils literal notranslate"><span class="pre">[]</span></code> 来实现。通过 <code class="docutils literal notranslate"><span class="pre">[列名]</span></code> 可以从 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 中取出相应的列，返回值为 <code class="docutils literal notranslate"><span class="pre">Series</span></code> ，例如从表中取出姓名一列：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [3]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">&#39;data/learn_pandas.csv&#39;</span><span class="p">,</span>
<span class="gp">   ...: </span>                 <span class="n">usecols</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;School&#39;</span><span class="p">,</span> <span class="s1">&#39;Grade&#39;</span><span class="p">,</span> <span class="s1">&#39;Name&#39;</span><span class="p">,</span> <span class="s1">&#39;Gender&#39;</span><span class="p">,</span>
<span class="gp">   ...: </span>                            <span class="s1">&#39;Weight&#39;</span><span class="p">,</span> <span class="s1">&#39;Transfer&#39;</span><span class="p">])</span>
<span class="gp">   ...: </span>

<span class="gp">In [4]: </span><span class="n">df</span><span class="p">[</span><span class="s1">&#39;Name&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[4]: </span>
<span class="go">0      Gaopeng Yang</span>
<span class="go">1    Changqiang You</span>
<span class="go">2           Mei Sun</span>
<span class="go">3      Xiaojuan Sun</span>
<span class="go">4       Gaojuan You</span>
<span class="go">Name: Name, dtype: object</span>
</pre></div>
</div>
<p>如果要取出多个列，则可以通过 <code class="docutils literal notranslate"><span class="pre">[列名组成的列表]</span></code> ，其返回值为一个 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> ，例如从表中取出性别和姓名两列：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [5]: </span><span class="n">df</span><span class="p">[[</span><span class="s1">&#39;Gender&#39;</span><span class="p">,</span> <span class="s1">&#39;Name&#39;</span><span class="p">]]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[5]: </span>
<span class="go">   Gender            Name</span>
<span class="go">0  Female    Gaopeng Yang</span>
<span class="go">1    Male  Changqiang You</span>
<span class="go">2    Male         Mei Sun</span>
<span class="go">3  Female    Xiaojuan Sun</span>
<span class="go">4    Male     Gaojuan You</span>
</pre></div>
</div>
<p>此外，若要取出单列，且列名中不包含空格，则可以用 <code class="docutils literal notranslate"><span class="pre">.列名</span></code> 取出，这和 <code class="docutils literal notranslate"><span class="pre">[列名]</span></code> 是等价的：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [6]: </span><span class="n">df</span><span class="o">.</span><span class="n">Name</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[6]: </span>
<span class="go">0      Gaopeng Yang</span>
<span class="go">1    Changqiang You</span>
<span class="go">2           Mei Sun</span>
<span class="go">3      Xiaojuan Sun</span>
<span class="go">4       Gaojuan You</span>
<span class="go">Name: Name, dtype: object</span>
</pre></div>
</div>
</section>
<section id="id4">
<h3>2. 序列的行索引<a class="headerlink" href="#id4" title="Permalink to this heading">#</a></h3>
<p>【a】以字符串为索引的 <code class="docutils literal notranslate"><span class="pre">Series</span></code></p>
<p>如果取出单个索引的对应元素，则可以使用 <code class="docutils literal notranslate"><span class="pre">[item]</span></code> ，若 <code class="docutils literal notranslate"><span class="pre">Series</span></code> 只有单个值对应，则返回这个标量值，如果有多个值对应，则返回一个 <code class="docutils literal notranslate"><span class="pre">Series</span></code>：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [7]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">],</span>
<span class="gp">   ...: </span>               <span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">,</span> <span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;c&#39;</span><span class="p">])</span>
<span class="gp">   ...: </span>

<span class="gp">In [8]: </span><span class="n">s</span><span class="p">[</span><span class="s1">&#39;a&#39;</span><span class="p">]</span>
<span class="gh">Out[8]: </span>
<span class="go">a    1</span>
<span class="go">a    3</span>
<span class="go">a    4</span>
<span class="go">a    5</span>
<span class="go">dtype: int64</span>

<span class="gp">In [9]: </span><span class="n">s</span><span class="p">[</span><span class="s1">&#39;b&#39;</span><span class="p">]</span>
<span class="gh">Out[9]: </span><span class="go">2</span>
</pre></div>
</div>
<p>如果取出多个索引的对应元素，则可以使用 <code class="docutils literal notranslate"><span class="pre">[items的列表]</span></code> ：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [10]: </span><span class="n">s</span><span class="p">[[</span><span class="s1">&#39;c&#39;</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">]]</span>
<span class="gh">Out[10]: </span>
<span class="go">c    6</span>
<span class="go">b    2</span>
<span class="go">dtype: int64</span>
</pre></div>
</div>
<p>如果想要取出某两个索引之间的元素，并且这两个索引是在整个索引中唯一出现，则可以使用切片，同时需要注意这里的切片会包含两个端点：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [11]: </span><span class="n">s</span><span class="p">[</span><span class="s1">&#39;c&#39;</span><span class="p">:</span> <span class="s1">&#39;b&#39;</span><span class="p">:</span> <span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="gh">Out[11]: </span>
<span class="go">c    6</span>
<span class="go">a    4</span>
<span class="go">b    2</span>
<span class="go">dtype: int64</span>
</pre></div>
</div>
<p>如果前后端点的值存在重复，即非唯一值，那么需要经过排序才能使用切片：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [12]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp">   ....: </span>    <span class="n">s</span><span class="p">[</span><span class="s1">&#39;a&#39;</span><span class="p">:</span> <span class="s1">&#39;b&#39;</span><span class="p">]</span>
<span class="gp">   ....: </span><span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp">   ....: </span>    <span class="n">Err_Msg</span> <span class="o">=</span> <span class="n">e</span>
<span class="gp">   ....: </span>

<span class="gp">In [13]: </span><span class="n">Err_Msg</span>
<span class="gh">Out[13]: </span><span class="go">KeyError(&quot;Cannot get left slice bound for non-unique label: &#39;a&#39;&quot;)</span>

<span class="gp">In [14]: </span><span class="n">s</span><span class="o">.</span><span class="n">sort_index</span><span class="p">()[</span><span class="s1">&#39;a&#39;</span><span class="p">:</span> <span class="s1">&#39;b&#39;</span><span class="p">]</span>
<span class="gh">Out[14]: </span>
<span class="go">a    1</span>
<span class="go">a    3</span>
<span class="go">a    4</span>
<span class="go">a    5</span>
<span class="go">b    2</span>
<span class="go">dtype: int64</span>
</pre></div>
</div>
<p>【b】以整数为索引的 <code class="docutils literal notranslate"><span class="pre">Series</span></code></p>
<p>在使用数据的读入函数时，如果不特别指定所对应的列作为索引，那么会生成从0开始的整数索引作为默认索引。当然，任意一组符合长度要求的整数都可以作为索引。</p>
<p>和字符串一样，如果使用 <code class="docutils literal notranslate"><span class="pre">[int]</span></code> 或 <code class="docutils literal notranslate"><span class="pre">[int_list]</span></code> ，则可以取出对应索引 <span class="red">元素</span> 的值：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [15]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">,</span> <span class="s1">&#39;c&#39;</span><span class="p">,</span> <span class="s1">&#39;d&#39;</span><span class="p">,</span> <span class="s1">&#39;e&#39;</span><span class="p">,</span> <span class="s1">&#39;f&#39;</span><span class="p">],</span>
<span class="gp">   ....: </span>              <span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span>
<span class="gp">   ....: </span>

<span class="gp">In [16]: </span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="gh">Out[16]: </span>
<span class="go">1    a</span>
<span class="go">1    c</span>
<span class="go">dtype: object</span>

<span class="gp">In [17]: </span><span class="n">s</span><span class="p">[[</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]]</span>
<span class="gh">Out[17]: </span>
<span class="go">2    d</span>
<span class="go">3    b</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
<p>如果使用整数切片，则会取出对应索引 <span class="red">位置</span> 的值，注意这里的整数切片同 <code class="docutils literal notranslate"><span class="pre">Python</span></code> 中的切片一样不包含右端点：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [18]: </span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span><span class="mi">2</span><span class="p">]</span>
<span class="gh">Out[18]: </span>
<span class="go">3    b</span>
<span class="go">2    d</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
<div class="caution admonition">
<p class="admonition-title">关于索引类型的说明</p>
<blockquote>
<div><p>如果不想陷入麻烦，那么请不要把纯浮点以及任何混合类型（字符串、整数、浮点类型等的混合）作为索引，否则可能会在具体的操作时报错或者返回非预期的结果，并且在实际的数据分析中也不存在这样做的动机。</p>
</div></blockquote>
</div>
</section>
<section id="loc">
<h3>3. loc索引器<a class="headerlink" href="#loc" title="Permalink to this heading">#</a></h3>
<p>前面讲到了对 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 的列进行选取，下面要讨论其行的选取。对于表而言，有两种索引器，一种是基于 <span class="red">元素</span> 的 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 索引器，另一种是基于 <span class="red">位置</span> 的 <code class="docutils literal notranslate"><span class="pre">iloc</span></code> 索引器。</p>
<p><code class="docutils literal notranslate"><span class="pre">loc</span></code> 索引器的一般形式是 <code class="docutils literal notranslate"><span class="pre">loc[*,</span> <span class="pre">*]</span></code> ，其中第一个 <code class="docutils literal notranslate"><span class="pre">*</span></code> 代表行的选择，第二个 <code class="docutils literal notranslate"><span class="pre">*</span></code> 代表列的选择，如果省略第二个位置写作 <code class="docutils literal notranslate"><span class="pre">loc[*]</span></code> ，这个 <code class="docutils literal notranslate"><span class="pre">*</span></code> 是指行的筛选。其中， <code class="docutils literal notranslate"><span class="pre">*</span></code> 的位置一共有五类合法对象，分别是：单个元素、元素列表、元素切片、布尔列表以及函数，下面将依次说明。</p>
<p>为了演示相应操作，先利用 <code class="docutils literal notranslate"><span class="pre">set_index</span></code> 方法把 <code class="docutils literal notranslate"><span class="pre">Name</span></code> 列设为索引，关于该函数的其他用法将在多级索引一章介绍。</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [19]: </span><span class="n">df_demo</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">set_index</span><span class="p">(</span><span class="s1">&#39;Name&#39;</span><span class="p">)</span>

<span class="gp">In [20]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[20]: </span>
<span class="go">                                       School      Grade  Gender  Weight Transfer</span>
<span class="go">Name                                                                             </span>
<span class="go">Gaopeng Yang    Shanghai Jiao Tong University   Freshman  Female    46.0        N</span>
<span class="go">Changqiang You              Peking University   Freshman    Male    70.0        N</span>
<span class="go">Mei Sun         Shanghai Jiao Tong University     Senior    Male    89.0        N</span>
<span class="go">Xiaojuan Sun                 Fudan University  Sophomore  Female    41.0        N</span>
<span class="go">Gaojuan You                  Fudan University  Sophomore    Male    74.0        N</span>
</pre></div>
</div>
<p>【a】 <code class="docutils literal notranslate"><span class="pre">*</span></code> 为单个元素</p>
<p>此时，直接取出相应的行或列，如果该元素在索引中重复则结果为 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code>，否则为 <code class="docutils literal notranslate"><span class="pre">Series</span></code> ：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [21]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s1">&#39;Qiang Sun&#39;</span><span class="p">]</span> <span class="c1"># 多个人叫此名字</span>
<span class="gh">Out[21]: </span>
<span class="go">                                  School      Grade  Gender  Weight Transfer</span>
<span class="go">Name                                                                        </span>
<span class="go">Qiang Sun            Tsinghua University     Junior  Female    53.0        N</span>
<span class="go">Qiang Sun            Tsinghua University  Sophomore  Female    40.0        N</span>
<span class="go">Qiang Sun  Shanghai Jiao Tong University     Junior  Female     NaN        N</span>

<span class="gp">In [22]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s1">&#39;Quan Zhao&#39;</span><span class="p">]</span> <span class="c1"># 名字唯一</span>
<span class="gh">Out[22]: </span>
<span class="go">School      Shanghai Jiao Tong University</span>
<span class="go">Grade                              Junior</span>
<span class="go">Gender                             Female</span>
<span class="go">Weight                               53.0</span>
<span class="go">Transfer                                N</span>
<span class="go">Name: Quan Zhao, dtype: object</span>
</pre></div>
</div>
<p>也可以同时选择行和列：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [23]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s1">&#39;Qiang Sun&#39;</span><span class="p">,</span> <span class="s1">&#39;School&#39;</span><span class="p">]</span> <span class="c1"># 返回Series</span>
<span class="gh">Out[23]: </span>
<span class="go">Name</span>
<span class="go">Qiang Sun              Tsinghua University</span>
<span class="go">Qiang Sun              Tsinghua University</span>
<span class="go">Qiang Sun    Shanghai Jiao Tong University</span>
<span class="go">Name: School, dtype: object</span>

<span class="gp">In [24]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s1">&#39;Quan Zhao&#39;</span><span class="p">,</span> <span class="s1">&#39;School&#39;</span><span class="p">]</span> <span class="c1"># 返回单个元素</span>
<span class="gh">Out[24]: </span><span class="go">&#39;Shanghai Jiao Tong University&#39;</span>
</pre></div>
</div>
<p>【b】 <code class="docutils literal notranslate"><span class="pre">*</span></code> 为元素列表</p>
<p>此时，取出列表中所有元素值对应的行或列：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [25]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[[</span><span class="s1">&#39;Qiang Sun&#39;</span><span class="p">,</span><span class="s1">&#39;Quan Zhao&#39;</span><span class="p">],</span> <span class="p">[</span><span class="s1">&#39;School&#39;</span><span class="p">,</span><span class="s1">&#39;Gender&#39;</span><span class="p">]]</span>
<span class="gh">Out[25]: </span>
<span class="go">                                  School  Gender</span>
<span class="go">Name                                            </span>
<span class="go">Qiang Sun            Tsinghua University  Female</span>
<span class="go">Qiang Sun            Tsinghua University  Female</span>
<span class="go">Qiang Sun  Shanghai Jiao Tong University  Female</span>
<span class="go">Quan Zhao  Shanghai Jiao Tong University  Female</span>
</pre></div>
</div>
<p>【c】 <code class="docutils literal notranslate"><span class="pre">*</span></code> 为切片</p>
<p>之前的 <code class="docutils literal notranslate"><span class="pre">Series</span></code> 使用字符串索引时提到，如果是唯一值的起点和终点字符，那么就可以使用切片，并且包含两个端点，如果不唯一则报错：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [26]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s1">&#39;Gaojuan You&#39;</span><span class="p">:</span><span class="s1">&#39;Gaoqiang Qian&#39;</span><span class="p">,</span> <span class="s1">&#39;School&#39;</span><span class="p">:</span><span class="s1">&#39;Gender&#39;</span><span class="p">]</span>
<span class="gh">Out[26]: </span>
<span class="go">                                      School      Grade  Gender</span>
<span class="go">Name                                                           </span>
<span class="go">Gaojuan You                 Fudan University  Sophomore    Male</span>
<span class="go">Xiaoli Qian              Tsinghua University   Freshman  Female</span>
<span class="go">Qiang Chu      Shanghai Jiao Tong University   Freshman  Female</span>
<span class="go">Gaoqiang Qian            Tsinghua University     Junior  Female</span>
</pre></div>
</div>
<p>需要注意的是，如果 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 使用整数索引，其使用整数切片的时候和上面字符串索引的要求一致，都是 <span class="red">元素</span> 切片，包含端点且起点、终点不允许有重复值。</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [27]: </span><span class="n">df_loc_slice_demo</span> <span class="o">=</span> <span class="n">df_demo</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>

<span class="gp">In [28]: </span><span class="n">df_loc_slice_demo</span><span class="o">.</span><span class="n">index</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="n">df_demo</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="mi">0</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>

<span class="gp">In [29]: </span><span class="n">df_loc_slice_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="mi">5</span><span class="p">:</span><span class="mi">3</span><span class="p">]</span>
<span class="gh">Out[29]: </span>
<span class="go">                          School   Grade  Gender  Weight Transfer</span>
<span class="go">5               Fudan University  Junior  Female    46.0        N</span>
<span class="go">4            Tsinghua University  Senior  Female    50.0        N</span>
<span class="go">3  Shanghai Jiao Tong University  Senior  Female    45.0        N</span>

<span class="gp">In [30]: </span><span class="n">df_loc_slice_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="mi">3</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span> <span class="c1"># 没有返回，说明不是整数位置切片</span>
<span class="gh">Out[30]: </span>
<span class="go">Empty DataFrame</span>
<span class="go">Columns: [School, Grade, Gender, Weight, Transfer]</span>
<span class="go">Index: []</span>
</pre></div>
</div>
<p>【d】 <code class="docutils literal notranslate"><span class="pre">*</span></code> 为布尔列表</p>
<p>在实际的数据处理中，根据条件来筛选行是极其常见的，此处传入 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 的布尔列表与 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 长度相同，且列表为 <code class="docutils literal notranslate"><span class="pre">True</span></code> 的位置所对应的行会被选中， <code class="docutils literal notranslate"><span class="pre">False</span></code> 则会被剔除。</p>
<p>例如，选出体重超过70kg的学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [31]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_demo</span><span class="o">.</span><span class="n">Weight</span><span class="o">&gt;</span><span class="mi">70</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[31]: </span>
<span class="go">                                      School      Grade Gender  Weight Transfer</span>
<span class="go">Name                                                                           </span>
<span class="go">Mei Sun        Shanghai Jiao Tong University     Senior   Male    89.0        N</span>
<span class="go">Gaojuan You                 Fudan University  Sophomore   Male    74.0        N</span>
<span class="go">Xiaopeng Zhou  Shanghai Jiao Tong University   Freshman   Male    74.0        N</span>
<span class="go">Xiaofeng Sun             Tsinghua University     Senior   Male    71.0        N</span>
<span class="go">Qiang Zheng    Shanghai Jiao Tong University     Senior   Male    87.0        N</span>
</pre></div>
</div>
<p>前面所提到的传入元素列表，也可以通过 <code class="docutils literal notranslate"><span class="pre">isin</span></code> 方法返回的布尔列表等价写出，例如选出所有大一和大四的同学信息：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [32]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_demo</span><span class="o">.</span><span class="n">Grade</span><span class="o">.</span><span class="n">isin</span><span class="p">([</span><span class="s1">&#39;Freshman&#39;</span><span class="p">,</span> <span class="s1">&#39;Senior&#39;</span><span class="p">])]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[32]: </span>
<span class="go">                                       School     Grade  Gender  Weight Transfer</span>
<span class="go">Name                                                                            </span>
<span class="go">Gaopeng Yang    Shanghai Jiao Tong University  Freshman  Female    46.0        N</span>
<span class="go">Changqiang You              Peking University  Freshman    Male    70.0        N</span>
<span class="go">Mei Sun         Shanghai Jiao Tong University    Senior    Male    89.0        N</span>
<span class="go">Xiaoli Qian               Tsinghua University  Freshman  Female    51.0        N</span>
<span class="go">Qiang Chu       Shanghai Jiao Tong University  Freshman  Female    52.0        N</span>
</pre></div>
</div>
<p>对于复合条件而言，可以用 <code class="docutils literal notranslate"><span class="pre">|（或）,</span> <span class="pre">&amp;（且）,</span> <span class="pre">~（取反）</span></code> 的组合来实现，例如选出复旦大学中体重超过70kg的大四学生，或者北大男生中体重超过80kg的非大四的学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [33]: </span><span class="n">condition_1_1</span> <span class="o">=</span> <span class="n">df_demo</span><span class="o">.</span><span class="n">School</span> <span class="o">==</span> <span class="s1">&#39;Fudan University&#39;</span>

<span class="gp">In [34]: </span><span class="n">condition_1_2</span> <span class="o">=</span> <span class="n">df_demo</span><span class="o">.</span><span class="n">Grade</span> <span class="o">==</span> <span class="s1">&#39;Senior&#39;</span>

<span class="gp">In [35]: </span><span class="n">condition_1_3</span> <span class="o">=</span> <span class="n">df_demo</span><span class="o">.</span><span class="n">Weight</span> <span class="o">&gt;</span> <span class="mi">70</span>

<span class="gp">In [36]: </span><span class="n">condition_1</span> <span class="o">=</span> <span class="n">condition_1_1</span> <span class="o">&amp;</span> <span class="n">condition_1_2</span> <span class="o">&amp;</span> <span class="n">condition_1_3</span>

<span class="gp">In [37]: </span><span class="n">condition_2_1</span> <span class="o">=</span> <span class="n">df_demo</span><span class="o">.</span><span class="n">School</span> <span class="o">==</span> <span class="s1">&#39;Peking University&#39;</span>

<span class="gp">In [38]: </span><span class="n">condition_2_2</span> <span class="o">=</span> <span class="n">df_demo</span><span class="o">.</span><span class="n">Grade</span> <span class="o">==</span> <span class="s1">&#39;Senior&#39;</span>

<span class="gp">In [39]: </span><span class="n">condition_2_3</span> <span class="o">=</span> <span class="n">df_demo</span><span class="o">.</span><span class="n">Weight</span> <span class="o">&gt;</span> <span class="mi">80</span>

<span class="gp">In [40]: </span><span class="n">condition_2</span> <span class="o">=</span> <span class="n">condition_2_1</span> <span class="o">&amp;</span> <span class="p">(</span><span class="o">~</span><span class="n">condition_2_2</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">condition_2_3</span>

<span class="gp">In [41]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">condition_1</span> <span class="o">|</span> <span class="n">condition_2</span><span class="p">]</span>
<span class="gh">Out[41]: </span>
<span class="go">                           School     Grade Gender  Weight Transfer</span>
<span class="go">Name                                                               </span>
<span class="go">Qiang Han       Peking University  Freshman   Male    87.0        N</span>
<span class="go">Chengpeng Zhou   Fudan University    Senior   Male    81.0        N</span>
<span class="go">Changpeng Zhao  Peking University  Freshman   Male    83.0        N</span>
<span class="go">Chengpeng Qian   Fudan University    Senior   Male    73.0        Y</span>
</pre></div>
</div>
<div class="hint admonition">
<p class="admonition-title">练一练</p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">select_dtypes</span></code> 是一个实用函数，它能够从表中选出相应类型的列，若要选出所有数值型的列，只需使用 <code class="docutils literal notranslate"><span class="pre">.select_dtypes('number')</span></code> ，请利用布尔列表选择的方法结合 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 的 <code class="docutils literal notranslate"><span class="pre">dtypes</span></code> 属性在 <code class="docutils literal notranslate"><span class="pre">learn_pandas</span></code> 数据集上实现这个功能。</p>
</div></blockquote>
</div>
<p>【e】 <code class="docutils literal notranslate"><span class="pre">*</span></code> 为函数</p>
<p>这里的函数，必须以前面的四种合法形式之一为返回值，并且函数的输入值为 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 本身。假设仍然是上述复合条件筛选的例子，可以把逻辑写入一个函数中再返回，需要注意的是函数的形式参数 <code class="docutils literal notranslate"><span class="pre">x</span></code> 本质上即为 <code class="docutils literal notranslate"><span class="pre">df_demo</span></code> ：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [42]: </span><span class="k">def</span> <span class="nf">condition</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="gp">   ....: </span>    <span class="n">condition_1_1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">School</span> <span class="o">==</span> <span class="s1">&#39;Fudan University&#39;</span>
<span class="gp">   ....: </span>    <span class="n">condition_1_2</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">Grade</span> <span class="o">==</span> <span class="s1">&#39;Senior&#39;</span>
<span class="gp">   ....: </span>    <span class="n">condition_1_3</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">Weight</span> <span class="o">&gt;</span> <span class="mi">70</span>
<span class="gp">   ....: </span>    <span class="n">condition_1</span> <span class="o">=</span> <span class="n">condition_1_1</span> <span class="o">&amp;</span> <span class="n">condition_1_2</span> <span class="o">&amp;</span> <span class="n">condition_1_3</span>
<span class="gp">   ....: </span>    <span class="n">condition_2_1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">School</span> <span class="o">==</span> <span class="s1">&#39;Peking University&#39;</span>
<span class="gp">   ....: </span>    <span class="n">condition_2_2</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">Grade</span> <span class="o">==</span> <span class="s1">&#39;Senior&#39;</span>
<span class="gp">   ....: </span>    <span class="n">condition_2_3</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">Weight</span> <span class="o">&gt;</span> <span class="mi">80</span>
<span class="gp">   ....: </span>    <span class="n">condition_2</span> <span class="o">=</span> <span class="n">condition_2_1</span> <span class="o">&amp;</span> <span class="p">(</span><span class="o">~</span><span class="n">condition_2_2</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">condition_2_3</span>
<span class="gp">   ....: </span>    <span class="n">result</span> <span class="o">=</span> <span class="n">condition_1</span> <span class="o">|</span> <span class="n">condition_2</span>
<span class="gp">   ....: </span>    <span class="k">return</span> <span class="n">result</span>
<span class="gp">   ....: </span>

<span class="gp">In [43]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">condition</span><span class="p">]</span>
<span class="gh">Out[43]: </span>
<span class="go">                           School     Grade Gender  Weight Transfer</span>
<span class="go">Name                                                               </span>
<span class="go">Qiang Han       Peking University  Freshman   Male    87.0        N</span>
<span class="go">Chengpeng Zhou   Fudan University    Senior   Male    81.0        N</span>
<span class="go">Changpeng Zhao  Peking University  Freshman   Male    83.0        N</span>
<span class="go">Chengpeng Qian   Fudan University    Senior   Male    73.0        Y</span>
</pre></div>
</div>
<p>此外，还支持使用 <code class="docutils literal notranslate"><span class="pre">lambda</span></code> 表达式，其返回值也同样必须是先前提到的四种形式之一：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [44]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="s1">&#39;Quan Zhao&#39;</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="s1">&#39;Gender&#39;</span><span class="p">]</span>
<span class="gh">Out[44]: </span><span class="go">&#39;Female&#39;</span>
</pre></div>
</div>
<p>由于函数无法返回如 <code class="docutils literal notranslate"><span class="pre">start:</span> <span class="pre">end:</span> <span class="pre">step</span></code> 的切片形式，故返回切片时要用 <code class="docutils literal notranslate"><span class="pre">slice</span></code> 对象进行包装：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [45]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">slice</span><span class="p">(</span><span class="s1">&#39;Gaojuan You&#39;</span><span class="p">,</span> <span class="s1">&#39;Gaoqiang Qian&#39;</span><span class="p">)]</span>
<span class="gh">Out[45]: </span>
<span class="go">                                      School      Grade  Gender  Weight Transfer</span>
<span class="go">Name                                                                            </span>
<span class="go">Gaojuan You                 Fudan University  Sophomore    Male    74.0        N</span>
<span class="go">Xiaoli Qian              Tsinghua University   Freshman  Female    51.0        N</span>
<span class="go">Qiang Chu      Shanghai Jiao Tong University   Freshman  Female    52.0        N</span>
<span class="go">Gaoqiang Qian            Tsinghua University     Junior  Female    50.0        N</span>
</pre></div>
</div>
<p>最后需要指出的是，对于 <code class="docutils literal notranslate"><span class="pre">Series</span></code> 也可以使用 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 索引，其遵循的原则与 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 中用于行筛选的 <code class="docutils literal notranslate"><span class="pre">loc[*]</span></code> 完全一致，此处不再赘述。</p>
<div class="caution admonition">
<p class="admonition-title">不要使用链式赋值</p>
<blockquote>
<div><p>在对表或者序列赋值时，应当在使用一层索引器后直接进行赋值操作，这样做是由于进行多次索引后赋值是赋在临时返回的 <code class="docutils literal notranslate"><span class="pre">copy</span></code> 副本上的，而没有真正修改元素从而报出 <code class="docutils literal notranslate"><span class="pre">SettingWithCopyWarning</span></code> 警告。例如，下面给出的例子：</p>
</div></blockquote>
</div>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [46]: </span><span class="n">df_chain</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">],[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">]],</span> <span class="n">columns</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="s1">&#39;AB&#39;</span><span class="p">))</span>

<span class="gp">In [47]: </span><span class="n">df_chain</span>
<span class="gh">Out[47]: </span>
<span class="go">   A  B</span>
<span class="go">0  0  0</span>
<span class="go">1  1  0</span>
<span class="go">2 -1  0</span>

<span class="gp">In [48]: </span><span class="kn">import</span> <span class="nn">warnings</span>

<span class="gp">In [49]: </span><span class="k">with</span> <span class="n">warnings</span><span class="o">.</span><span class="n">catch_warnings</span><span class="p">():</span>
<span class="gp">   ....: </span>    <span class="n">warnings</span><span class="o">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s1">&#39;error&#39;</span><span class="p">)</span>
<span class="gp">   ....: </span>    <span class="k">try</span><span class="p">:</span>
<span class="gp">   ....: </span>        <span class="n">df_chain</span><span class="p">[</span><span class="n">df_chain</span><span class="o">.</span><span class="n">A</span><span class="o">!=</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">B</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># 使用方括号列索引后，再使用点的列索引</span>
<span class="gp">   ....: </span>    <span class="k">except</span> <span class="ne">Warning</span> <span class="k">as</span> <span class="n">w</span><span class="p">:</span>
<span class="gp">   ....: </span>        <span class="n">Warning_Msg</span> <span class="o">=</span> <span class="n">w</span>
<span class="gp">   ....: </span>

<span class="gp">In [50]: </span><span class="nb">print</span><span class="p">(</span><span class="n">Warning_Msg</span><span class="p">)</span>

<span class="go">A value is trying to be set on a copy of a slice from a DataFrame.</span>
<span class="go">Try using .loc[row_indexer,col_indexer] = value instead</span>

<span class="go">See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy</span>

<span class="gp">In [51]: </span><span class="n">df_chain</span>
<span class="gh">Out[51]: </span>
<span class="go">   A  B</span>
<span class="go">0  0  0</span>
<span class="go">1  1  0</span>
<span class="go">2 -1  0</span>

<span class="gp">In [52]: </span><span class="n">df_chain</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_chain</span><span class="o">.</span><span class="n">A</span><span class="o">!=</span><span class="mi">0</span><span class="p">,</span><span class="s1">&#39;B&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>

<span class="gp">In [53]: </span><span class="n">df_chain</span>
<span class="gh">Out[53]: </span>
<span class="go">   A  B</span>
<span class="go">0  0  0</span>
<span class="go">1  1  1</span>
<span class="go">2 -1  1</span>
</pre></div>
</div>
</section>
<section id="iloc">
<h3>4. iloc索引器<a class="headerlink" href="#iloc" title="Permalink to this heading">#</a></h3>
<p><code class="docutils literal notranslate"><span class="pre">iloc</span></code> 的使用与 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 完全类似，只不过是针对位置进行筛选，在相应的 <code class="docutils literal notranslate"><span class="pre">*</span></code> 位置处一共也有五类合法对象，分别是：整数、整数列表、整数切片、布尔列表以及函数，函数的返回值必须是前面的四类合法对象中的一个，其输入同样也为 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 本身。</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [54]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span> <span class="c1"># 第二行第二列</span>
<span class="gh">Out[54]: </span><span class="go">&#39;Freshman&#39;</span>

<span class="gp">In [55]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">iloc</span><span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]]</span> <span class="c1"># 前两行前两列</span>
<span class="gh">Out[55]: </span>
<span class="go">                                       School     Grade</span>
<span class="go">Name                                                   </span>
<span class="go">Gaopeng Yang    Shanghai Jiao Tong University  Freshman</span>
<span class="go">Changqiang You              Peking University  Freshman</span>

<span class="gp">In [56]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span><span class="mi">4</span><span class="p">]</span> <span class="c1"># 切片不包含结束端点</span>
<span class="gh">Out[56]: </span>
<span class="go">                Gender  Weight</span>
<span class="go">Name                          </span>
<span class="go">Changqiang You    Male    70.0</span>
<span class="go">Mei Sun           Male    89.0</span>
<span class="go">Xiaojuan Sun    Female    41.0</span>

<span class="gp">In [57]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">slice</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">)]</span> <span class="c1"># 传入切片为返回值的函数</span>
<span class="gh">Out[57]: </span>
<span class="go">                                       School      Grade  Gender  Weight Transfer</span>
<span class="go">Name                                                                             </span>
<span class="go">Changqiang You              Peking University   Freshman    Male    70.0        N</span>
<span class="go">Mei Sun         Shanghai Jiao Tong University     Senior    Male    89.0        N</span>
<span class="go">Xiaojuan Sun                 Fudan University  Sophomore  Female    41.0        N</span>
</pre></div>
</div>
<p>在使用布尔列表的时候要特别注意，不能传入 <code class="docutils literal notranslate"><span class="pre">Series</span></code> 而必须传入序列的 <code class="docutils literal notranslate"><span class="pre">values</span></code> ，否则会报错。因此，在使用布尔筛选的时候还是应当优先考虑 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 的方式。</p>
<p>例如，选出体重超过80kg的学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [58]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">iloc</span><span class="p">[(</span><span class="n">df_demo</span><span class="o">.</span><span class="n">Weight</span><span class="o">&gt;</span><span class="mi">80</span><span class="p">)</span><span class="o">.</span><span class="n">values</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[58]: </span>
<span class="go">                                       School      Grade Gender  Weight Transfer</span>
<span class="go">Name                                                                            </span>
<span class="go">Mei Sun         Shanghai Jiao Tong University     Senior   Male    89.0        N</span>
<span class="go">Qiang Zheng     Shanghai Jiao Tong University     Senior   Male    87.0        N</span>
<span class="go">Qiang Han                   Peking University   Freshman   Male    87.0        N</span>
<span class="go">Chengpeng Zhou               Fudan University     Senior   Male    81.0        N</span>
<span class="go">Feng Han        Shanghai Jiao Tong University  Sophomore   Male    82.0        N</span>
</pre></div>
</div>
<p>对 <code class="docutils literal notranslate"><span class="pre">Series</span></code> 而言同样也可以通过 <code class="docutils literal notranslate"><span class="pre">iloc</span></code> 返回相应位置的值或子序列：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [59]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">School</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="gh">Out[59]: </span><span class="go">&#39;Peking University&#39;</span>

<span class="gp">In [60]: </span><span class="n">df_demo</span><span class="o">.</span><span class="n">School</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">5</span><span class="p">:</span><span class="mi">2</span><span class="p">]</span>
<span class="gh">Out[60]: </span>
<span class="go">Name</span>
<span class="go">Changqiang You    Peking University</span>
<span class="go">Xiaojuan Sun       Fudan University</span>
<span class="go">Name: School, dtype: object</span>
</pre></div>
</div>
</section>
<section id="query">
<h3>5. query方法<a class="headerlink" href="#query" title="Permalink to this heading">#</a></h3>
<p>在 <code class="docutils literal notranslate"><span class="pre">pandas</span></code> 中，支持把字符串形式的查询表达式传入 <code class="docutils literal notranslate"><span class="pre">query</span></code> 方法来查询数据，其表达式的执行结果必须返回布尔列表。在进行复杂索引时，由于这种检索方式无需像普通方法一样重复使用 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 的名字来引用列名，一般而言会使代码长度在不降低可读性的前提下有所减少。</p>
<p>例如，将 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 一节中的复合条件查询例子可以如下改写：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [61]: </span><span class="n">df</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s1">&#39;((School == &quot;Fudan University&quot;)&amp;&#39;</span>
<span class="gp">   ....: </span>         <span class="s1">&#39; (Grade == &quot;Senior&quot;)&amp;&#39;</span>
<span class="gp">   ....: </span>         <span class="s1">&#39; (Weight &gt; 70))|&#39;</span>
<span class="gp">   ....: </span>         <span class="s1">&#39;((School == &quot;Peking University&quot;)&amp;&#39;</span>
<span class="gp">   ....: </span>         <span class="s1">&#39; (Grade != &quot;Senior&quot;)&amp;&#39;</span>
<span class="gp">   ....: </span>         <span class="s1">&#39; (Weight &gt; 80))&#39;</span><span class="p">)</span>
<span class="gp">   ....: </span>
<span class="gh">Out[61]: </span>
<span class="go">                School     Grade            Name Gender  Weight Transfer</span>
<span class="go">38   Peking University  Freshman       Qiang Han   Male    87.0        N</span>
<span class="go">66    Fudan University    Senior  Chengpeng Zhou   Male    81.0        N</span>
<span class="go">99   Peking University  Freshman  Changpeng Zhao   Male    83.0        N</span>
<span class="go">131   Fudan University    Senior  Chengpeng Qian   Male    73.0        Y</span>
</pre></div>
</div>
<p>在 <code class="docutils literal notranslate"><span class="pre">query</span></code> 表达式中，帮用户注册了所有来自 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 的列名，所有属于该 <code class="docutils literal notranslate"><span class="pre">Series</span></code> 的方法都可以被调用，和正常的函数调用并没有区别，例如查询体重超过均值的学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [62]: </span><span class="n">df</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s1">&#39;Weight &gt; Weight.mean()&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[62]: </span>
<span class="go">                           School      Grade            Name  Gender  Weight Transfer</span>
<span class="go">1               Peking University   Freshman  Changqiang You    Male    70.0        N</span>
<span class="go">2   Shanghai Jiao Tong University     Senior         Mei Sun    Male    89.0        N</span>
<span class="go">4                Fudan University  Sophomore     Gaojuan You    Male    74.0        N</span>
<span class="go">10  Shanghai Jiao Tong University   Freshman   Xiaopeng Zhou    Male    74.0        N</span>
<span class="go">14            Tsinghua University     Senior    Xiaomei Zhou  Female    57.0        N</span>
</pre></div>
</div>
<div class="note admonition">
<p class="admonition-title">query中引用带空格的列名</p>
<blockquote>
<div><p>对于含有空格的列名，需要使用 <code class="docutils literal notranslate"><span class="pre">`col</span> <span class="pre">name`</span></code> 的方式进行引用。</p>
</div></blockquote>
</div>
<p>同时，在 <code class="docutils literal notranslate"><span class="pre">query</span></code> 中还注册了若干英语的字面用法，帮助提高可读性，例如： <code class="docutils literal notranslate"><span class="pre">or,</span> <span class="pre">and,</span> <span class="pre">or,</span> <span class="pre">in,</span> <span class="pre">not</span> <span class="pre">in</span></code> 。例如，筛选出男生中不是大一大二的学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [63]: </span><span class="n">df</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s1">&#39;(Grade not in [&quot;Freshman&quot;, &quot;Sophomore&quot;]) and&#39;</span>
<span class="gp">   ....: </span>         <span class="s1">&#39;(Gender == &quot;Male&quot;)&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   ....: </span>
<span class="gh">Out[63]: </span>
<span class="go">                           School   Grade           Name Gender  Weight Transfer</span>
<span class="go">2   Shanghai Jiao Tong University  Senior        Mei Sun   Male    89.0        N</span>
<span class="go">16            Tsinghua University  Junior  Xiaoqiang Qin   Male    68.0        N</span>
<span class="go">17            Tsinghua University  Junior      Peng Wang   Male    65.0        N</span>
<span class="go">18            Tsinghua University  Senior   Xiaofeng Sun   Male    71.0        N</span>
<span class="go">21  Shanghai Jiao Tong University  Senior  Xiaopeng Shen   Male    62.0      NaN</span>
</pre></div>
</div>
<p>此外，在字符串中出现与列表的比较时， <code class="docutils literal notranslate"><span class="pre">==</span></code> 和 <code class="docutils literal notranslate"><span class="pre">!=</span></code> 分别表示元素出现在列表和没有出现在列表，等价于 <code class="docutils literal notranslate"><span class="pre">in</span></code> 和 <code class="docutils literal notranslate"><span class="pre">not</span> <span class="pre">in</span></code>，例如查询所有大三和大四的学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [64]: </span><span class="n">df</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s1">&#39;Grade == [&quot;Junior&quot;, &quot;Senior&quot;]&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[64]: </span>
<span class="go">                           School   Grade           Name  Gender  Weight Transfer</span>
<span class="go">2   Shanghai Jiao Tong University  Senior        Mei Sun    Male    89.0        N</span>
<span class="go">7             Tsinghua University  Junior  Gaoqiang Qian  Female    50.0        N</span>
<span class="go">9               Peking University  Junior        Juan Xu  Female     NaN        N</span>
<span class="go">11            Tsinghua University  Junior    Xiaoquan Lv  Female    43.0        N</span>
<span class="go">12  Shanghai Jiao Tong University  Senior       Peng You  Female    48.0      NaN</span>
</pre></div>
</div>
<p>对于 <code class="docutils literal notranslate"><span class="pre">query</span></code> 中的字符串，如果要引用外部变量，只需在变量名前加 <code class="docutils literal notranslate"><span class="pre">&#64;</span></code> 符号。例如，取出体重位于70kg到80kg之间的学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [65]: </span><span class="n">low</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span><span class="mi">70</span><span class="p">,</span> <span class="mi">80</span>

<span class="gp">In [66]: </span><span class="n">df</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s1">&#39;(Weight &gt;= @low) &amp; (Weight &lt;= @high)&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[66]: </span>
<span class="go">                           School      Grade            Name Gender  Weight Transfer</span>
<span class="go">1               Peking University   Freshman  Changqiang You   Male    70.0        N</span>
<span class="go">4                Fudan University  Sophomore     Gaojuan You   Male    74.0        N</span>
<span class="go">10  Shanghai Jiao Tong University   Freshman   Xiaopeng Zhou   Male    74.0        N</span>
<span class="go">18            Tsinghua University     Senior    Xiaofeng Sun   Male    71.0        N</span>
<span class="go">35              Peking University   Freshman      Gaoli Zhao   Male    78.0        N</span>
</pre></div>
</div>
</section>
<section id="id5">
<h3>6. 随机抽样<a class="headerlink" href="#id5" title="Permalink to this heading">#</a></h3>
<p>如果把 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 的每一行看作一个样本，或把每一列看作一个特征，再把整个 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 看作总体，想要对样本或特征进行随机抽样就可以用 <code class="docutils literal notranslate"><span class="pre">sample</span></code> 函数。有时在拿到大型数据集后，想要对统计特征进行计算来了解数据的大致分布，但是这很费时间。同时，由于许多统计特征在等概率不放回的简单随机抽样条件下，是总体统计特征的无偏估计，比如样本均值和总体均值，那么就可以先从整张表中抽出一部分来做近似估计。</p>
<p><code class="docutils literal notranslate"><span class="pre">sample</span></code> 函数中的主要参数为 <code class="docutils literal notranslate"><span class="pre">n,</span> <span class="pre">axis,</span> <span class="pre">frac,</span> <span class="pre">replace,</span> <span class="pre">weights</span></code> ，前三个分别是指抽样数量、抽样的方向（0为行、1为列）和抽样比例（0.3则为从总体中抽出30%的样本）。</p>
<p><code class="docutils literal notranslate"><span class="pre">replace</span></code> 和 <code class="docutils literal notranslate"><span class="pre">weights</span></code> 分别是指是否放回和每个样本的抽样相对概率，当 <code class="docutils literal notranslate"><span class="pre">replace</span> <span class="pre">=</span> <span class="pre">True</span></code> 则表示有放回抽样。例如，对下面构造的 <code class="docutils literal notranslate"><span class="pre">df_sample</span></code> 以 <code class="docutils literal notranslate"><span class="pre">value</span></code> 值的相对大小为抽样概率进行有放回抽样，抽样数量为3。</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [67]: </span><span class="n">df_sample</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s1">&#39;id&#39;</span><span class="p">:</span> <span class="nb">list</span><span class="p">(</span><span class="s1">&#39;abcde&#39;</span><span class="p">),</span>
<span class="gp">   ....: </span>                          <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">90</span><span class="p">]})</span>
<span class="gp">   ....: </span>

<span class="gp">In [68]: </span><span class="n">df_sample</span>
<span class="gh">Out[68]: </span>
<span class="go">  id  value</span>
<span class="go">0  a      1</span>
<span class="go">1  b      2</span>
<span class="go">2  c      3</span>
<span class="go">3  d      4</span>
<span class="go">4  e     90</span>

<span class="gp">In [69]: </span><span class="n">df_sample</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="n">replace</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span> <span class="n">weights</span> <span class="o">=</span> <span class="n">df_sample</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="gh">Out[69]: </span>
<span class="go">  id  value</span>
<span class="go">4  e     90</span>
<span class="go">4  e     90</span>
<span class="go">4  e     90</span>
</pre></div>
</div>
</section>
</section>
<section id="id6">
<h2>二、多级索引<a class="headerlink" href="#id6" title="Permalink to this heading">#</a></h2>
<section id="id7">
<h3>1. 多级索引及其表的结构<a class="headerlink" href="#id7" title="Permalink to this heading">#</a></h3>
<p>为了更加清晰地说明具有多级索引的 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 结构，下面新构造一张表，读者可以忽略这里的构造方法，它们将会在第4小节被更详细地讲解。</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [70]: </span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>

<span class="gp">In [71]: </span><span class="n">multi_index</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_product</span><span class="p">([</span><span class="nb">list</span><span class="p">(</span><span class="s1">&#39;ABCD&#39;</span><span class="p">),</span>
<span class="gp">   ....: </span>              <span class="n">df</span><span class="o">.</span><span class="n">Gender</span><span class="o">.</span><span class="n">unique</span><span class="p">()],</span> <span class="n">names</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;School&#39;</span><span class="p">,</span> <span class="s1">&#39;Gender&#39;</span><span class="p">))</span>
<span class="gp">   ....: </span>

<span class="gp">In [72]: </span><span class="n">multi_column</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_product</span><span class="p">([[</span><span class="s1">&#39;Height&#39;</span><span class="p">,</span> <span class="s1">&#39;Weight&#39;</span><span class="p">],</span>
<span class="gp">   ....: </span>               <span class="n">df</span><span class="o">.</span><span class="n">Grade</span><span class="o">.</span><span class="n">unique</span><span class="p">()],</span> <span class="n">names</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;Indicator&#39;</span><span class="p">,</span> <span class="s1">&#39;Grade&#39;</span><span class="p">))</span>
<span class="gp">   ....: </span>

<span class="gp">In [73]: </span><span class="n">df_multi</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">c_</span><span class="p">[(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span><span class="o">*</span><span class="mi">5</span> <span class="o">+</span> <span class="mi">163</span><span class="p">)</span><span class="o">.</span><span class="n">tolist</span><span class="p">(),</span>
<span class="gp">   ....: </span>                              <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span><span class="o">*</span><span class="mi">5</span> <span class="o">+</span> <span class="mi">65</span><span class="p">)</span><span class="o">.</span><span class="n">tolist</span><span class="p">()],</span>
<span class="gp">   ....: </span>                        <span class="n">index</span> <span class="o">=</span> <span class="n">multi_index</span><span class="p">,</span>
<span class="gp">   ....: </span>                        <span class="n">columns</span> <span class="o">=</span> <span class="n">multi_column</span><span class="p">)</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="gp">   ....: </span>

<span class="gp">In [74]: </span><span class="n">df_multi</span>
<span class="gh">Out[74]: </span>
<span class="go">Indicator       Height                           Weight                        </span>
<span class="go">Grade         Freshman Senior Sophomore Junior Freshman Senior Sophomore Junior</span>
<span class="go">School Gender                                                                  </span>
<span class="go">A      Female    171.8  165.0     167.9  174.2     60.6   55.1      63.3   65.8</span>
<span class="go">       Male      172.3  158.1     167.8  162.2     71.2   71.0      63.1   63.5</span>
<span class="go">B      Female    162.5  165.1     163.7  170.3     59.8   57.9      56.5   74.8</span>
<span class="go">       Male      166.8  163.6     165.2  164.7     62.5   62.8      58.7   68.9</span>
<span class="go">C      Female    170.5  162.0     164.6  158.7     56.9   63.9      60.5   66.9</span>
<span class="go">       Male      150.2  166.3     167.3  159.3     62.4   59.1      64.9   67.1</span>
<span class="go">D      Female    174.3  155.7     163.2  162.1     65.3   66.5      61.8   63.2</span>
<span class="go">       Male      170.7  170.3     163.8  164.9     61.6   63.2      60.9   56.4</span>
</pre></div>
</div>
<p>下图通过颜色区分，标记了 <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> 的结构。与单层索引的表一样，具备元素值、行索引和列索引三个部分。其中，这里的行索引和列索引都是 <code class="docutils literal notranslate"><span class="pre">MultiIndex</span></code> 类型，只不过 <span class="red">索引中的一个元素是元组</span> 而不是单层索引中的标量。例如，行索引的第四个元素为 <code class="docutils literal notranslate"><span class="pre">(&quot;B&quot;,</span> <span class="pre">&quot;Male&quot;)</span></code> ，列索引的第二个元素为 <code class="docutils literal notranslate"><span class="pre">(&quot;Height&quot;,</span> <span class="pre">&quot;Senior&quot;)</span></code> ，这里需要注意，外层连续出现相同的值时，第一次之后出现的会被隐藏显示，使结果的可读性增强。</p>
<a class="reference internal image-reference" href="../_images/multi_index.png"><img alt="../_images/multi_index.png" class="align-center" src="../_images/multi_index.png" style="width: 700.0px; height: 262.0px;" /></a>
<p>与单层索引类似， <code class="docutils literal notranslate"><span class="pre">MultiIndex</span></code> 也具有名字属性，图中的 <code class="docutils literal notranslate"><span class="pre">School</span></code> 和 <code class="docutils literal notranslate"><span class="pre">Gender</span></code> 分别对应了表的第一层和第二层行索引的名字， <code class="docutils literal notranslate"><span class="pre">Indicator</span></code> 和 <code class="docutils literal notranslate"><span class="pre">Grade</span></code> 分别对应了第一层和第二层列索引的名字。</p>
<p>索引的名字和值属性分别可以通过 <code class="docutils literal notranslate"><span class="pre">names</span></code> 和 <code class="docutils literal notranslate"><span class="pre">values</span></code> 获得：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [75]: </span><span class="n">df_multi</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">names</span>
<span class="gh">Out[75]: </span><span class="go">FrozenList([&#39;School&#39;, &#39;Gender&#39;])</span>

<span class="gp">In [76]: </span><span class="n">df_multi</span><span class="o">.</span><span class="n">columns</span><span class="o">.</span><span class="n">names</span>
<span class="gh">Out[76]: </span><span class="go">FrozenList([&#39;Indicator&#39;, &#39;Grade&#39;])</span>

<span class="gp">In [77]: </span><span class="n">df_multi</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">values</span>
<span class="gh">Out[77]: </span>
<span class="go">array([(&#39;A&#39;, &#39;Female&#39;), (&#39;A&#39;, &#39;Male&#39;), (&#39;B&#39;, &#39;Female&#39;), (&#39;B&#39;, &#39;Male&#39;),</span>
<span class="go">       (&#39;C&#39;, &#39;Female&#39;), (&#39;C&#39;, &#39;Male&#39;), (&#39;D&#39;, &#39;Female&#39;), (&#39;D&#39;, &#39;Male&#39;)],</span>
<span class="go">      dtype=object)</span>

<span class="gp">In [78]: </span><span class="n">df_multi</span><span class="o">.</span><span class="n">columns</span><span class="o">.</span><span class="n">values</span>
<span class="gh">Out[78]: </span>
<span class="go">array([(&#39;Height&#39;, &#39;Freshman&#39;), (&#39;Height&#39;, &#39;Senior&#39;),</span>
<span class="go">       (&#39;Height&#39;, &#39;Sophomore&#39;), (&#39;Height&#39;, &#39;Junior&#39;),</span>
<span class="go">       (&#39;Weight&#39;, &#39;Freshman&#39;), (&#39;Weight&#39;, &#39;Senior&#39;),</span>
<span class="go">       (&#39;Weight&#39;, &#39;Sophomore&#39;), (&#39;Weight&#39;, &#39;Junior&#39;)], dtype=object)</span>
</pre></div>
</div>
<p>如果想要得到某一层的索引，则需要通过 <code class="docutils literal notranslate"><span class="pre">get_level_values</span></code> 获得：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [79]: </span><span class="n">df_multi</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">get_level_values</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="gh">Out[79]: </span><span class="go">Index([&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;C&#39;, &#39;C&#39;, &#39;D&#39;, &#39;D&#39;], dtype=&#39;object&#39;, name=&#39;School&#39;)</span>
</pre></div>
</div>
<p>但对于索引而言，无论是单层还是多层，用户都无法通过 <code class="docutils literal notranslate"><span class="pre">index_obj[0]</span> <span class="pre">=</span> <span class="pre">item</span></code> 的方式来修改元素，也不能通过 <code class="docutils literal notranslate"><span class="pre">index_name[0]</span> <span class="pre">=</span> <span class="pre">new_name</span></code> 的方式来修改名字，关于如何修改这些属性的话题将在第三节被讨论。</p>
</section>
<section id="id8">
<h3>2. 多级索引中的loc索引器<a class="headerlink" href="#id8" title="Permalink to this heading">#</a></h3>
<p>熟悉了结构后，现在回到原表，将学校和年级设为索引，此时的行为多级索引，列为单级索引，由于默认状态的列索引不含名字，因此对应于刚刚图中 <code class="docutils literal notranslate"><span class="pre">Indicator</span></code> 和 <code class="docutils literal notranslate"><span class="pre">Grade</span></code> 的索引名位置是空缺的。</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [80]: </span><span class="n">df_multi</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">set_index</span><span class="p">([</span><span class="s1">&#39;School&#39;</span><span class="p">,</span> <span class="s1">&#39;Grade&#39;</span><span class="p">])</span>

<span class="gp">In [81]: </span><span class="n">df_multi</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[81]: </span>
<span class="go">                                                   Name  Gender  Weight Transfer</span>
<span class="go">School                        Grade                                             </span>
<span class="go">Shanghai Jiao Tong University Freshman     Gaopeng Yang  Female    46.0        N</span>
<span class="go">Peking University             Freshman   Changqiang You    Male    70.0        N</span>
<span class="go">Shanghai Jiao Tong University Senior            Mei Sun    Male    89.0        N</span>
<span class="go">Fudan University              Sophomore    Xiaojuan Sun  Female    41.0        N</span>
<span class="go">                              Sophomore     Gaojuan You    Male    74.0        N</span>
</pre></div>
</div>
<p>由于多级索引中的单个元素以元组为单位，因此之前在第一节介绍的 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 和 <code class="docutils literal notranslate"><span class="pre">iloc</span></code> 方法完全可以照搬，只需把标量的位置替换成对应的元组。</p>
<p>当传入元组列表或单个元组或返回前二者的函数时，需要先进行索引排序以避免性能警告：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [82]: </span><span class="k">with</span> <span class="n">warnings</span><span class="o">.</span><span class="n">catch_warnings</span><span class="p">():</span>
<span class="gp">   ....: </span>    <span class="n">warnings</span><span class="o">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s1">&#39;error&#39;</span><span class="p">)</span>
<span class="gp">   ....: </span>    <span class="k">try</span><span class="p">:</span>
<span class="gp">   ....: </span>        <span class="n">df_multi</span><span class="o">.</span><span class="n">loc</span><span class="p">[(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Junior&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   ....: </span>    <span class="k">except</span> <span class="ne">Warning</span> <span class="k">as</span> <span class="n">w</span><span class="p">:</span>
<span class="gp">   ....: </span>        <span class="n">Warning_Msg</span> <span class="o">=</span> <span class="n">w</span>
<span class="gp">   ....: </span>

<span class="gp">In [83]: </span><span class="n">Warning_Msg</span>
<span class="gh">Out[83]: </span><span class="go">pandas.errors.PerformanceWarning(&#39;indexing past lexsort depth may impact performance.&#39;)</span>

<span class="gp">In [84]: </span><span class="n">df_sorted</span> <span class="o">=</span> <span class="n">df_multi</span><span class="o">.</span><span class="n">sort_index</span><span class="p">()</span>

<span class="gp">In [85]: </span><span class="n">df_sorted</span><span class="o">.</span><span class="n">loc</span><span class="p">[(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Junior&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[85]: </span>
<span class="go">                                  Name  Gender  Weight Transfer</span>
<span class="go">School           Grade                                         </span>
<span class="go">Fudan University Junior      Yanli You  Female    48.0        N</span>
<span class="go">                 Junior  Chunqiang Chu    Male    72.0        N</span>
<span class="go">                 Junior   Changfeng Lv    Male    76.0        N</span>
<span class="go">                 Junior     Yanjuan Lv  Female    49.0      NaN</span>
<span class="go">                 Junior  Gaoqiang Zhou  Female    43.0        N</span>

<span class="gp">In [86]: </span><span class="n">df_sorted</span><span class="o">.</span><span class="n">loc</span><span class="p">[[(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Senior&#39;</span><span class="p">),</span>
<span class="gp">   ....: </span>              <span class="p">(</span><span class="s1">&#39;Shanghai Jiao Tong University&#39;</span><span class="p">,</span> <span class="s1">&#39;Freshman&#39;</span><span class="p">)]]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   ....: </span>
<span class="gh">Out[86]: </span>
<span class="go">                                    Name  Gender  Weight Transfer</span>
<span class="go">School           Grade                                           </span>
<span class="go">Fudan University Senior  Chengpeng Zheng  Female    38.0        N</span>
<span class="go">                 Senior        Feng Zhou  Female    47.0        N</span>
<span class="go">                 Senior        Gaomei Lv  Female    34.0        N</span>
<span class="go">                 Senior        Chunli Lv  Female    56.0        N</span>
<span class="go">                 Senior   Chengpeng Zhou    Male    81.0        N</span>

<span class="gp">In [87]: </span><span class="n">df_sorted</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_sorted</span><span class="o">.</span><span class="n">Weight</span> <span class="o">&gt;</span> <span class="mi">70</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c1"># 布尔列表也是可用的</span>
<span class="gh">Out[87]: </span>
<span class="go">                                     Name Gender  Weight Transfer</span>
<span class="go">School           Grade                                           </span>
<span class="go">Fudan University Freshman       Feng Wang   Male    74.0        N</span>
<span class="go">                 Junior     Chunqiang Chu   Male    72.0        N</span>
<span class="go">                 Junior      Changfeng Lv   Male    76.0        N</span>
<span class="go">                 Senior    Chengpeng Zhou   Male    81.0        N</span>
<span class="go">                 Senior    Chengpeng Qian   Male    73.0        Y</span>

<span class="gp">In [88]: </span><span class="n">df_sorted</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span><span class="s1">&#39;Junior&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[88]: </span>
<span class="go">                                  Name  Gender  Weight Transfer</span>
<span class="go">School           Grade                                         </span>
<span class="go">Fudan University Junior      Yanli You  Female    48.0        N</span>
<span class="go">                 Junior  Chunqiang Chu    Male    72.0        N</span>
<span class="go">                 Junior   Changfeng Lv    Male    76.0        N</span>
<span class="go">                 Junior     Yanjuan Lv  Female    49.0      NaN</span>
<span class="go">                 Junior  Gaoqiang Zhou  Female    43.0        N</span>
</pre></div>
</div>
<p>当使用切片时需要注意，在单级索引中只要切片端点元素是唯一的，那么就可以进行切片，但在多级索引中，无论元组在索引中是否重复出现，都必须经过排序才能使用切片，否则报错：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [89]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp">   ....: </span>    <span class="n">df_multi</span><span class="o">.</span><span class="n">loc</span><span class="p">[(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Senior&#39;</span><span class="p">):]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   ....: </span><span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp">   ....: </span>    <span class="n">Err_Msg</span> <span class="o">=</span> <span class="n">e</span>
<span class="gp">   ....: </span>

<span class="gp">In [90]: </span><span class="n">Err_Msg</span>
<span class="gh">Out[90]: </span><span class="go">pandas.errors.UnsortedIndexError(&#39;Key length (2) was greater than MultiIndex lexsort depth (0)&#39;)</span>

<span class="gp">In [91]: </span><span class="n">df_sorted</span><span class="o">.</span><span class="n">loc</span><span class="p">[(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Senior&#39;</span><span class="p">):]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[91]: </span>
<span class="go">                                    Name  Gender  Weight Transfer</span>
<span class="go">School           Grade                                           </span>
<span class="go">Fudan University Senior  Chengpeng Zheng  Female    38.0        N</span>
<span class="go">                 Senior        Feng Zhou  Female    47.0        N</span>
<span class="go">                 Senior        Gaomei Lv  Female    34.0        N</span>
<span class="go">                 Senior        Chunli Lv  Female    56.0        N</span>
<span class="go">                 Senior   Chengpeng Zhou    Male    81.0        N</span>

<span class="gp">In [92]: </span><span class="n">df_unique</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">drop_duplicates</span><span class="p">(</span><span class="n">subset</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;School&#39;</span><span class="p">,</span><span class="s1">&#39;Grade&#39;</span><span class="p">]</span>
<span class="gp">   ....: </span>                              <span class="p">)</span><span class="o">.</span><span class="n">set_index</span><span class="p">([</span><span class="s1">&#39;School&#39;</span><span class="p">,</span> <span class="s1">&#39;Grade&#39;</span><span class="p">])</span>
<span class="gp">   ....: </span>

<span class="gp">In [93]: </span><span class="n">df_unique</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[93]: </span>
<span class="go">                                                   Name  Gender  Weight Transfer</span>
<span class="go">School                        Grade                                             </span>
<span class="go">Shanghai Jiao Tong University Freshman     Gaopeng Yang  Female    46.0        N</span>
<span class="go">Peking University             Freshman   Changqiang You    Male    70.0        N</span>
<span class="go">Shanghai Jiao Tong University Senior            Mei Sun    Male    89.0        N</span>
<span class="go">Fudan University              Sophomore    Xiaojuan Sun  Female    41.0        N</span>
<span class="go">Tsinghua University           Freshman      Xiaoli Qian  Female    51.0        N</span>

<span class="gp">In [94]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp">   ....: </span>    <span class="n">df_unique</span><span class="o">.</span><span class="n">loc</span><span class="p">[(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Senior&#39;</span><span class="p">):]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   ....: </span><span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp">   ....: </span>    <span class="n">Err_Msg</span> <span class="o">=</span> <span class="n">e</span>
<span class="gp">   ....: </span>

<span class="gp">In [95]: </span><span class="n">Err_Msg</span>
<span class="gh">Out[95]: </span><span class="go">pandas.errors.UnsortedIndexError(&#39;Key length (2) was greater than MultiIndex lexsort depth (0)&#39;)</span>

<span class="gp">In [96]: </span><span class="n">df_unique</span><span class="o">.</span><span class="n">sort_index</span><span class="p">()</span><span class="o">.</span><span class="n">loc</span><span class="p">[(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Senior&#39;</span><span class="p">):]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[96]: </span>
<span class="go">                                        Name  Gender  Weight Transfer</span>
<span class="go">School            Grade                                              </span>
<span class="go">Fudan University  Senior     Chengpeng Zheng  Female    38.0        N</span>
<span class="go">                  Sophomore     Xiaojuan Sun  Female    41.0        N</span>
<span class="go">Peking University Freshman    Changqiang You    Male    70.0        N</span>
<span class="go">                  Junior             Juan Xu  Female     NaN        N</span>
<span class="go">                  Senior          Changli Lv  Female    41.0        N</span>
</pre></div>
</div>
<p>此外，在多级索引中的元组有一种特殊的用法，可以对多层的元素进行交叉组合后索引，但同时需要指定 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 的列，全选则用 <code class="docutils literal notranslate"><span class="pre">:</span></code> 表示。其中，每一层需要选中的元素用列表存放，传入 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 的形式为 <code class="docutils literal notranslate"><span class="pre">[(level_0_list,</span> <span class="pre">level_1_list),</span> <span class="pre">cols]</span></code> 。例如，想要得到所有北大和复旦的大二大三学生，可以如下写出：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [97]: </span><span class="n">res</span> <span class="o">=</span> <span class="n">df_multi</span><span class="o">.</span><span class="n">loc</span><span class="p">[([</span><span class="s1">&#39;Peking University&#39;</span><span class="p">,</span> <span class="s1">&#39;Fudan University&#39;</span><span class="p">],</span>
<span class="gp">   ....: </span>                    <span class="p">[</span><span class="s1">&#39;Sophomore&#39;</span><span class="p">,</span> <span class="s1">&#39;Junior&#39;</span><span class="p">]),</span> <span class="p">:]</span>
<span class="gp">   ....: </span>

<span class="gp">In [98]: </span><span class="n">res</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[98]: </span>
<span class="go">                                     Name  Gender  Weight Transfer</span>
<span class="go">School            Grade                                           </span>
<span class="go">Peking University Sophomore   Changmei Xu  Female    43.0        N</span>
<span class="go">                  Sophomore  Xiaopeng Qin    Male     NaN        N</span>
<span class="go">                  Sophomore        Mei Xu  Female    39.0        N</span>
<span class="go">                  Sophomore   Xiaoli Zhou  Female    55.0        N</span>
<span class="go">                  Sophomore      Peng Han  Female    34.0      NaN</span>

<span class="gp">In [99]: </span><span class="n">res</span><span class="o">.</span><span class="n">shape</span>
<span class="gh">Out[99]: </span><span class="go">(33, 4)</span>
</pre></div>
</div>
<p>下面的语句和上面类似，但仍然传入的是元素（这里为元组）的列表，它们的意义是不同的，表示的是选出北大的大三学生和复旦的大二学生：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [100]: </span><span class="n">res</span> <span class="o">=</span> <span class="n">df_multi</span><span class="o">.</span><span class="n">loc</span><span class="p">[[(</span><span class="s1">&#39;Peking University&#39;</span><span class="p">,</span> <span class="s1">&#39;Junior&#39;</span><span class="p">),</span>
<span class="gp">   .....: </span>                    <span class="p">(</span><span class="s1">&#39;Fudan University&#39;</span><span class="p">,</span> <span class="s1">&#39;Sophomore&#39;</span><span class="p">)]]</span>
<span class="gp">   .....: </span>

<span class="gp">In [101]: </span><span class="n">res</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[101]: </span>
<span class="go">                                   Name  Gender  Weight Transfer</span>
<span class="go">School            Grade                                         </span>
<span class="go">Peking University Junior        Juan Xu  Female     NaN        N</span>
<span class="go">                  Junior  Changjuan You  Female    47.0        N</span>
<span class="go">                  Junior       Gaoli Xu  Female    48.0        N</span>
<span class="go">                  Junior   Gaoquan Zhou    Male    70.0        N</span>
<span class="go">                  Junior      Qiang You  Female    56.0        N</span>

<span class="gp">In [102]: </span><span class="n">res</span><span class="o">.</span><span class="n">shape</span>
<span class="gh">Out[102]: </span><span class="go">(16, 4)</span>
</pre></div>
</div>
</section>
<section id="indexslice">
<h3>3. IndexSlice对象<a class="headerlink" href="#indexslice" title="Permalink to this heading">#</a></h3>
<p>前面介绍的方法，即使在索引不重复的时候，也只能对元组整体进行切片，而不能对每层进行切片，也不允许将切片和布尔列表混合使用，引入 <code class="docutils literal notranslate"><span class="pre">IndexSlice</span></code> 对象就能解决这个问题。 <code class="docutils literal notranslate"><span class="pre">Slice</span></code> 对象一共有两种形式，第一种为 <code class="docutils literal notranslate"><span class="pre">loc[idx[*,*]]</span></code> 型，第二种为 <code class="docutils literal notranslate"><span class="pre">loc[idx[*,*],idx[*,*]]</span></code> 型，下面将进行介绍。为了方便演示，下面构造一个 <span class="red">索引不重复的</span> <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> ：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [103]: </span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>

<span class="gp">In [104]: </span><span class="n">L1</span><span class="p">,</span><span class="n">L2</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;A&#39;</span><span class="p">,</span><span class="s1">&#39;B&#39;</span><span class="p">,</span><span class="s1">&#39;C&#39;</span><span class="p">],[</span><span class="s1">&#39;a&#39;</span><span class="p">,</span><span class="s1">&#39;b&#39;</span><span class="p">,</span><span class="s1">&#39;c&#39;</span><span class="p">]</span>

<span class="gp">In [105]: </span><span class="n">mul_index1</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_product</span><span class="p">([</span><span class="n">L1</span><span class="p">,</span><span class="n">L2</span><span class="p">],</span><span class="n">names</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;Upper&#39;</span><span class="p">,</span> <span class="s1">&#39;Lower&#39;</span><span class="p">))</span>

<span class="gp">In [106]: </span><span class="n">L3</span><span class="p">,</span><span class="n">L4</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;D&#39;</span><span class="p">,</span><span class="s1">&#39;E&#39;</span><span class="p">,</span><span class="s1">&#39;F&#39;</span><span class="p">],[</span><span class="s1">&#39;d&#39;</span><span class="p">,</span><span class="s1">&#39;e&#39;</span><span class="p">,</span><span class="s1">&#39;f&#39;</span><span class="p">]</span>

<span class="gp">In [107]: </span><span class="n">mul_index2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_product</span><span class="p">([</span><span class="n">L3</span><span class="p">,</span><span class="n">L4</span><span class="p">],</span><span class="n">names</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;Big&#39;</span><span class="p">,</span> <span class="s1">&#39;Small&#39;</span><span class="p">))</span>

<span class="gp">In [108]: </span><span class="n">df_ex</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="o">-</span><span class="mi">9</span><span class="p">,</span><span class="mi">10</span><span class="p">,(</span><span class="mi">9</span><span class="p">,</span><span class="mi">9</span><span class="p">)),</span>
<span class="gp">   .....: </span>                    <span class="n">index</span><span class="o">=</span><span class="n">mul_index1</span><span class="p">,</span>
<span class="gp">   .....: </span>                    <span class="n">columns</span><span class="o">=</span><span class="n">mul_index2</span><span class="p">)</span>
<span class="gp">   .....: </span>

<span class="gp">In [109]: </span><span class="n">df_ex</span>
<span class="gh">Out[109]: </span>
<span class="go">Big          D        E        F      </span>
<span class="go">Small        d  e  f  d  e  f  d  e  f</span>
<span class="go">Upper Lower                           </span>
<span class="go">A     a      3  6 -9 -6 -6 -2  0  9 -5</span>
<span class="go">      b     -3  3 -8 -3 -2  5  8 -4  4</span>
<span class="go">      c     -1  0  7 -4  6  6 -9  9 -6</span>
<span class="go">B     a      8  5 -2 -9 -8  0 -9  1 -6</span>
<span class="go">      b      2  9 -7 -9 -9 -5 -4 -3 -1</span>
<span class="go">      c      8  6 -5  0  1 -8 -8 -2  0</span>
<span class="go">C     a     -6 -3  2  5  9 -9  5 -6  3</span>
<span class="go">      b      1  2 -5 -3 -5  6 -6  3 -5</span>
<span class="go">      c     -1  5  6 -6  6  4  7  8 -4</span>
</pre></div>
</div>
<p>为了使用 <code class="docutils literal notranslate"><span class="pre">silce</span></code> 对象，先要进行定义：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [110]: </span><span class="n">idx</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">IndexSlice</span>
</pre></div>
</div>
<p>【a】 <code class="docutils literal notranslate"><span class="pre">loc[idx[*,*]]</span></code> 型</p>
<p>这种情况并不能进行多层分别切片，前一个 <code class="docutils literal notranslate"><span class="pre">*</span></code> 表示行的选择，后一个 <code class="docutils literal notranslate"><span class="pre">*</span></code> 表示列的选择，与单纯的 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 是类似的：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [111]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">[</span><span class="s1">&#39;C&#39;</span><span class="p">:,</span> <span class="p">(</span><span class="s1">&#39;D&#39;</span><span class="p">,</span> <span class="s1">&#39;f&#39;</span><span class="p">):]]</span>
<span class="gh">Out[111]: </span>
<span class="go">Big          D  E        F      </span>
<span class="go">Small        f  d  e  f  d  e  f</span>
<span class="go">Upper Lower                     </span>
<span class="go">C     a      2  5  9 -9  5 -6  3</span>
<span class="go">      b     -5 -3 -5  6 -6  3 -5</span>
<span class="go">      c      6 -6  6  4  7  8 -4</span>
</pre></div>
</div>
<p>另外，也支持布尔序列的索引：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [112]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">[:</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="n">x</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span><span class="o">&gt;</span><span class="mi">0</span><span class="p">]]</span> <span class="c1"># 列和大于0</span>
<span class="gh">Out[112]: </span>
<span class="go">Big          D     F</span>
<span class="go">Small        d  e  e</span>
<span class="go">Upper Lower         </span>
<span class="go">A     a      3  6  9</span>
<span class="go">      b     -3  3 -4</span>
<span class="go">      c     -1  0  9</span>
</pre></div>
</div>
<p>【b】 <code class="docutils literal notranslate"><span class="pre">loc[idx[*,*],idx[*,*]]</span></code> 型</p>
<p>这种情况能够分层进行切片，前一个 <code class="docutils literal notranslate"><span class="pre">idx</span></code> 指代的是行索引，后一个是列索引。</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [113]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">[:</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">:],</span> <span class="n">idx</span><span class="p">[</span><span class="s1">&#39;E&#39;</span><span class="p">:,</span> <span class="s1">&#39;e&#39;</span><span class="p">:]]</span>
<span class="gh">Out[113]: </span>
<span class="go">Big          E     F   </span>
<span class="go">Small        e  f  e  f</span>
<span class="go">Upper Lower            </span>
<span class="go">A     b     -2  5 -4  4</span>
<span class="go">      c      6  6  9 -6</span>
</pre></div>
</div>
<p>但需要注意的是，此时不支持使用函数：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [114]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp">   .....: </span>    <span class="n">df_ex</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">[:</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="s1">&#39;b&#39;</span><span class="p">],</span> <span class="n">idx</span><span class="p">[</span><span class="s1">&#39;E&#39;</span><span class="p">:,</span> <span class="s1">&#39;e&#39;</span><span class="p">:]]</span>
<span class="gp">   .....: </span><span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp">   .....: </span>    <span class="n">Err_Msg</span> <span class="o">=</span> <span class="n">e</span>
<span class="gp">   .....: </span>

<span class="gp">In [115]: </span><span class="n">Err_Msg</span>
<span class="gh">Out[115]: </span><span class="go">KeyError(&lt;function __main__.&lt;lambda&gt;(x)&gt;)</span>
</pre></div>
</div>
</section>
<section id="id9">
<h3>4. 多级索引的构造<a class="headerlink" href="#id9" title="Permalink to this heading">#</a></h3>
<p>前面提到了多级索引表的结构和切片，那么除了使用 <code class="docutils literal notranslate"><span class="pre">set_index</span></code> 之外，如何自己构造多级索引呢？常用的有 <code class="docutils literal notranslate"><span class="pre">from_tuples,</span> <span class="pre">from_arrays,</span> <span class="pre">from_product</span></code> 三种方法，它们都是 <code class="docutils literal notranslate"><span class="pre">pd.MultiIndex</span></code> 对象下的函数。</p>
<p><code class="docutils literal notranslate"><span class="pre">from_tuples</span></code> 指根据传入由元组组成的列表进行构造：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [116]: </span><span class="n">my_tuple</span> <span class="o">=</span> <span class="p">[(</span><span class="s1">&#39;a&#39;</span><span class="p">,</span><span class="s1">&#39;cat&#39;</span><span class="p">),(</span><span class="s1">&#39;a&#39;</span><span class="p">,</span><span class="s1">&#39;dog&#39;</span><span class="p">),(</span><span class="s1">&#39;b&#39;</span><span class="p">,</span><span class="s1">&#39;cat&#39;</span><span class="p">),(</span><span class="s1">&#39;b&#39;</span><span class="p">,</span><span class="s1">&#39;dog&#39;</span><span class="p">)]</span>

<span class="gp">In [117]: </span><span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_tuples</span><span class="p">(</span><span class="n">my_tuple</span><span class="p">,</span> <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;First&#39;</span><span class="p">,</span><span class="s1">&#39;Second&#39;</span><span class="p">])</span>
<span class="gh">Out[117]: </span>
<span class="go">MultiIndex([(&#39;a&#39;, &#39;cat&#39;),</span>
<span class="go">            (&#39;a&#39;, &#39;dog&#39;),</span>
<span class="go">            (&#39;b&#39;, &#39;cat&#39;),</span>
<span class="go">            (&#39;b&#39;, &#39;dog&#39;)],</span>
<span class="go">           names=[&#39;First&#39;, &#39;Second&#39;])</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">from_arrays</span></code> 指根据传入列表中，对应层的列表进行构造：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [118]: </span><span class="n">my_array</span> <span class="o">=</span> <span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="s1">&#39;aabb&#39;</span><span class="p">),</span> <span class="p">[</span><span class="s1">&#39;cat&#39;</span><span class="p">,</span> <span class="s1">&#39;dog&#39;</span><span class="p">]</span><span class="o">*</span><span class="mi">2</span><span class="p">]</span>

<span class="gp">In [119]: </span><span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_arrays</span><span class="p">(</span><span class="n">my_array</span><span class="p">,</span> <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;First&#39;</span><span class="p">,</span><span class="s1">&#39;Second&#39;</span><span class="p">])</span>
<span class="gh">Out[119]: </span>
<span class="go">MultiIndex([(&#39;a&#39;, &#39;cat&#39;),</span>
<span class="go">            (&#39;a&#39;, &#39;dog&#39;),</span>
<span class="go">            (&#39;b&#39;, &#39;cat&#39;),</span>
<span class="go">            (&#39;b&#39;, &#39;dog&#39;)],</span>
<span class="go">           names=[&#39;First&#39;, &#39;Second&#39;])</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">from_product</span></code> 指根据给定多个列表的笛卡尔积进行构造：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [120]: </span><span class="n">my_list1</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;a&#39;</span><span class="p">,</span><span class="s1">&#39;b&#39;</span><span class="p">]</span>

<span class="gp">In [121]: </span><span class="n">my_list2</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;cat&#39;</span><span class="p">,</span><span class="s1">&#39;dog&#39;</span><span class="p">]</span>

<span class="gp">In [122]: </span><span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_product</span><span class="p">([</span><span class="n">my_list1</span><span class="p">,</span>
<span class="gp">   .....: </span>                            <span class="n">my_list2</span><span class="p">],</span>
<span class="gp">   .....: </span>                           <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;First&#39;</span><span class="p">,</span><span class="s1">&#39;Second&#39;</span><span class="p">])</span>
<span class="gp">   .....: </span>
<span class="gh">Out[122]: </span>
<span class="go">MultiIndex([(&#39;a&#39;, &#39;cat&#39;),</span>
<span class="go">            (&#39;a&#39;, &#39;dog&#39;),</span>
<span class="go">            (&#39;b&#39;, &#39;cat&#39;),</span>
<span class="go">            (&#39;b&#39;, &#39;dog&#39;)],</span>
<span class="go">           names=[&#39;First&#39;, &#39;Second&#39;])</span>
</pre></div>
</div>
</section>
</section>
<section id="id10">
<h2>三、索引的常用方法<a class="headerlink" href="#id10" title="Permalink to this heading">#</a></h2>
<section id="id11">
<h3>1. 索引层的交换和删除<a class="headerlink" href="#id11" title="Permalink to this heading">#</a></h3>
<p>为了方便理解交换的过程，这里构造一个三级索引的例子：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [123]: </span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>

<span class="gp">In [124]: </span><span class="n">L1</span><span class="p">,</span><span class="n">L2</span><span class="p">,</span><span class="n">L3</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;A&#39;</span><span class="p">,</span><span class="s1">&#39;B&#39;</span><span class="p">],[</span><span class="s1">&#39;a&#39;</span><span class="p">,</span><span class="s1">&#39;b&#39;</span><span class="p">],[</span><span class="s1">&#39;alpha&#39;</span><span class="p">,</span><span class="s1">&#39;beta&#39;</span><span class="p">]</span>

<span class="gp">In [125]: </span><span class="n">mul_index1</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_product</span><span class="p">([</span><span class="n">L1</span><span class="p">,</span><span class="n">L2</span><span class="p">,</span><span class="n">L3</span><span class="p">],</span>
<span class="gp">   .....: </span>             <span class="n">names</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;Upper&#39;</span><span class="p">,</span> <span class="s1">&#39;Lower&#39;</span><span class="p">,</span><span class="s1">&#39;Extra&#39;</span><span class="p">))</span>
<span class="gp">   .....: </span>

<span class="gp">In [126]: </span><span class="n">L4</span><span class="p">,</span><span class="n">L5</span><span class="p">,</span><span class="n">L6</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;C&#39;</span><span class="p">,</span><span class="s1">&#39;D&#39;</span><span class="p">],[</span><span class="s1">&#39;c&#39;</span><span class="p">,</span><span class="s1">&#39;d&#39;</span><span class="p">],[</span><span class="s1">&#39;cat&#39;</span><span class="p">,</span><span class="s1">&#39;dog&#39;</span><span class="p">]</span>

<span class="gp">In [127]: </span><span class="n">mul_index2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">MultiIndex</span><span class="o">.</span><span class="n">from_product</span><span class="p">([</span><span class="n">L4</span><span class="p">,</span><span class="n">L5</span><span class="p">,</span><span class="n">L6</span><span class="p">],</span>
<span class="gp">   .....: </span>             <span class="n">names</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;Big&#39;</span><span class="p">,</span> <span class="s1">&#39;Small&#39;</span><span class="p">,</span> <span class="s1">&#39;Other&#39;</span><span class="p">))</span>
<span class="gp">   .....: </span>

<span class="gp">In [128]: </span><span class="n">df_ex</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="o">-</span><span class="mi">9</span><span class="p">,</span><span class="mi">10</span><span class="p">,(</span><span class="mi">8</span><span class="p">,</span><span class="mi">8</span><span class="p">)),</span>
<span class="gp">   .....: </span>                        <span class="n">index</span><span class="o">=</span><span class="n">mul_index1</span><span class="p">,</span>
<span class="gp">   .....: </span>                        <span class="n">columns</span><span class="o">=</span><span class="n">mul_index2</span><span class="p">)</span>
<span class="gp">   .....: </span>

<span class="gp">In [129]: </span><span class="n">df_ex</span>
<span class="gh">Out[129]: </span>
<span class="go">Big                 C               D            </span>
<span class="go">Small               c       d       c       d    </span>
<span class="go">Other             cat dog cat dog cat dog cat dog</span>
<span class="go">Upper Lower Extra                                </span>
<span class="go">A     a     alpha   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">            beta   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">      b     alpha  -4   4  -1   0   7  -4   6   6</span>
<span class="go">            beta   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B     a     alpha   0  -9   1  -6   2   9  -7  -9</span>
<span class="go">            beta   -9  -5  -4  -3  -1   8   6  -5</span>
<span class="go">      b     alpha   0   1  -8  -8  -2   0  -6  -3</span>
<span class="go">            beta    2   5   9  -9   5  -6   3   1</span>
</pre></div>
</div>
<p>索引层的交换由 <code class="docutils literal notranslate"><span class="pre">swaplevel</span></code> 和 <code class="docutils literal notranslate"><span class="pre">reorder_levels</span></code> 完成，前者只能交换两个层，而后者可以交换任意层，两者都可以指定交换的是轴是哪一个，即行索引或列索引：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [130]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">swaplevel</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c1"># 列索引的第一层和第三层交换</span>
<span class="gh">Out[130]: </span>
<span class="go">Other             cat dog cat dog cat dog cat dog</span>
<span class="go">Small               c   c   d   d   c   c   d   d</span>
<span class="go">Big                 C   C   C   C   D   D   D   D</span>
<span class="go">Upper Lower Extra                                </span>
<span class="go">A     a     alpha   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">            beta   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">      b     alpha  -4   4  -1   0   7  -4   6   6</span>
<span class="go">            beta   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B     a     alpha   0  -9   1  -6   2   9  -7  -9</span>

<span class="gp">In [131]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">reorder_levels</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c1"># 列表数字指代原来索引中的层</span>
<span class="gh">Out[131]: </span>
<span class="go">Big                 C               D            </span>
<span class="go">Small               c       d       c       d    </span>
<span class="go">Other             cat dog cat dog cat dog cat dog</span>
<span class="go">Extra Upper Lower                                </span>
<span class="go">alpha A     a       3   6  -9  -6  -6  -2   0   9</span>
<span class="go">beta  A     a      -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">alpha A     b      -4   4  -1   0   7  -4   6   6</span>
<span class="go">beta  A     b      -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">alpha B     a       0  -9   1  -6   2   9  -7  -9</span>
</pre></div>
</div>
<div class="note admonition">
<p class="admonition-title">轴之间的索引交换</p>
<blockquote>
<div><p>这里只涉及行或列索引内部的交换，不同方向索引之间的交换将在第五章中被讨论。</p>
</div></blockquote>
</div>
<p>若想要删除某一层的索引，可以使用 <code class="docutils literal notranslate"><span class="pre">droplevel</span></code> 方法：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [132]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">droplevel</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gh">Out[132]: </span>
<span class="go">Big                 C               D            </span>
<span class="go">Other             cat dog cat dog cat dog cat dog</span>
<span class="go">Upper Lower Extra                                </span>
<span class="go">A     a     alpha   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">            beta   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">      b     alpha  -4   4  -1   0   7  -4   6   6</span>
<span class="go">            beta   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B     a     alpha   0  -9   1  -6   2   9  -7  -9</span>
<span class="go">            beta   -9  -5  -4  -3  -1   8   6  -5</span>
<span class="go">      b     alpha   0   1  -8  -8  -2   0  -6  -3</span>
<span class="go">            beta    2   5   9  -9   5  -6   3   1</span>

<span class="gp">In [133]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">droplevel</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="gh">Out[133]: </span>
<span class="go">Big     C               D            </span>
<span class="go">Small   c       d       c       d    </span>
<span class="go">Other cat dog cat dog cat dog cat dog</span>
<span class="go">Extra                                </span>
<span class="go">alpha   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">beta   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">alpha  -4   4  -1   0   7  -4   6   6</span>
<span class="go">beta   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">alpha   0  -9   1  -6   2   9  -7  -9</span>
<span class="go">beta   -9  -5  -4  -3  -1   8   6  -5</span>
<span class="go">alpha   0   1  -8  -8  -2   0  -6  -3</span>
<span class="go">beta    2   5   9  -9   5  -6   3   1</span>
</pre></div>
</div>
</section>
<section id="id12">
<h3>2. 索引属性的修改<a class="headerlink" href="#id12" title="Permalink to this heading">#</a></h3>
<p>通过 <code class="docutils literal notranslate"><span class="pre">rename_axis</span></code> 可以对索引层的名字进行修改，常用的修改方式是传入字典的映射：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [134]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">rename_axis</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;Upper&#39;</span><span class="p">:</span><span class="s1">&#39;Changed_row&#39;</span><span class="p">},</span>
<span class="gp">   .....: </span>                  <span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;Other&#39;</span><span class="p">:</span><span class="s1">&#39;Changed_Col&#39;</span><span class="p">})</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   .....: </span>
<span class="gh">Out[134]: </span>
<span class="go">Big                       C               D            </span>
<span class="go">Small                     c       d       c       d    </span>
<span class="go">Changed_Col             cat dog cat dog cat dog cat dog</span>
<span class="go">Changed_row Lower Extra                                </span>
<span class="go">A           a     alpha   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">                  beta   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">            b     alpha  -4   4  -1   0   7  -4   6   6</span>
<span class="go">                  beta   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B           a     alpha   0  -9   1  -6   2   9  -7  -9</span>
</pre></div>
</div>
<p>通过 <code class="docutils literal notranslate"><span class="pre">rename</span></code> 可以对索引的值进行修改，如果是多级索引需要指定修改的层号 <code class="docutils literal notranslate"><span class="pre">level</span></code> ：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [135]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;cat&#39;</span><span class="p">:</span><span class="s1">&#39;not_cat&#39;</span><span class="p">},</span>
<span class="gp">   .....: </span>             <span class="n">level</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   .....: </span>
<span class="gh">Out[135]: </span>
<span class="go">Big                     C                       D                </span>
<span class="go">Small                   c           d           c           d    </span>
<span class="go">Other             not_cat dog not_cat dog not_cat dog not_cat dog</span>
<span class="go">Upper Lower Extra                                                </span>
<span class="go">A     a     alpha       3   6      -9  -6      -6  -2       0   9</span>
<span class="go">            beta       -5  -3       3  -8      -3  -2       5   8</span>
<span class="go">      b     alpha      -4   4      -1   0       7  -4       6   6</span>
<span class="go">            beta       -9   9      -6   8       5  -2      -9  -8</span>
<span class="go">B     a     alpha       0  -9       1  -6       2   9      -7  -9</span>
</pre></div>
</div>
<p>传入参数也可以是函数，其输入值就是索引元素：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [136]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="nb">str</span><span class="o">.</span><span class="n">upper</span><span class="p">(</span><span class="n">x</span><span class="p">),</span>
<span class="gp">   .....: </span>             <span class="n">level</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gp">   .....: </span>
<span class="gh">Out[136]: </span>
<span class="go">Big                 C               D            </span>
<span class="go">Small               c       d       c       d    </span>
<span class="go">Other             cat dog cat dog cat dog cat dog</span>
<span class="go">Upper Lower Extra                                </span>
<span class="go">A     a     ALPHA   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">            BETA   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">      b     ALPHA  -4   4  -1   0   7  -4   6   6</span>
<span class="go">            BETA   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B     a     ALPHA   0  -9   1  -6   2   9  -7  -9</span>
</pre></div>
</div>
<div class="hint admonition">
<p class="admonition-title">练一练</p>
<blockquote>
<div><p>尝试在 <code class="docutils literal notranslate"><span class="pre">rename_axis</span></code> 中使用函数完成与例子中一样的功能，即把 <code class="docutils literal notranslate"><span class="pre">Upper</span></code> 和 <code class="docutils literal notranslate"><span class="pre">Other</span></code> 分别替换为 <code class="docutils literal notranslate"><span class="pre">Changed_row</span></code> 和 <code class="docutils literal notranslate"><span class="pre">Changed_col</span></code>。</p>
</div></blockquote>
</div>
<p>对于整个索引的元素替换，可以利用迭代器实现：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [137]: </span><span class="n">new_values</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="s1">&#39;abcdefgh&#39;</span><span class="p">))</span>

<span class="gp">In [138]: </span><span class="n">df_ex</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="nb">next</span><span class="p">(</span><span class="n">new_values</span><span class="p">),</span>
<span class="gp">   .....: </span>             <span class="n">level</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="gp">   .....: </span>
<span class="gh">Out[138]: </span>
<span class="go">Big                 C               D            </span>
<span class="go">Small               c       d       c       d    </span>
<span class="go">Other             cat dog cat dog cat dog cat dog</span>
<span class="go">Upper Lower Extra                                </span>
<span class="go">A     a     a       3   6  -9  -6  -6  -2   0   9</span>
<span class="go">            b      -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">      b     c      -4   4  -1   0   7  -4   6   6</span>
<span class="go">            d      -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B     a     e       0  -9   1  -6   2   9  -7  -9</span>
<span class="go">            f      -9  -5  -4  -3  -1   8   6  -5</span>
<span class="go">      b     g       0   1  -8  -8  -2   0  -6  -3</span>
<span class="go">            h       2   5   9  -9   5  -6   3   1</span>
</pre></div>
</div>
<p>若想要对某个位置的元素进行修改，在单层索引时容易实现，即先取出索引的 <code class="docutils literal notranslate"><span class="pre">values</span></code> 属性，再给对得到的列表进行修改，最后再对 <code class="docutils literal notranslate"><span class="pre">index</span></code> 对象重新赋值。但是如果是多级索引的话就有些麻烦，一个解决的方案是先把某一层索引临时转为表的元素，然后再进行修改，最后重新设定为索引，下面一节将介绍这些操作。</p>
<p>另外一个需要介绍的函数是 <code class="docutils literal notranslate"><span class="pre">map</span></code> ，它是定义在 <code class="docutils literal notranslate"><span class="pre">Index</span></code> 上的方法，与前面 <code class="docutils literal notranslate"><span class="pre">rename</span></code> 方法中层的函数式用法是类似的，只不过它传入的不是层的标量值，而是直接传入索引的元组，这为用户进行跨层的修改提供了便利。例如，可以等价地写出上面的字符串转大写的操作：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [139]: </span><span class="n">df_temp</span> <span class="o">=</span> <span class="n">df_ex</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>

<span class="gp">In [140]: </span><span class="n">new_idx</span> <span class="o">=</span> <span class="n">df_temp</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span>
<span class="gp">   .....: </span>                                       <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span>
<span class="gp">   .....: </span>                                       <span class="nb">str</span><span class="o">.</span><span class="n">upper</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">])))</span>
<span class="gp">   .....: </span>

<span class="gp">In [141]: </span><span class="n">df_temp</span><span class="o">.</span><span class="n">index</span> <span class="o">=</span> <span class="n">new_idx</span>

<span class="gp">In [142]: </span><span class="n">df_temp</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[142]: </span>
<span class="go">Big                 C               D            </span>
<span class="go">Small               c       d       c       d    </span>
<span class="go">Other             cat dog cat dog cat dog cat dog</span>
<span class="go">Upper Lower Extra                                </span>
<span class="go">A     a     ALPHA   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">            BETA   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">      b     ALPHA  -4   4  -1   0   7  -4   6   6</span>
<span class="go">            BETA   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B     a     ALPHA   0  -9   1  -6   2   9  -7  -9</span>
</pre></div>
</div>
<p>关于 <code class="docutils literal notranslate"><span class="pre">map</span></code> 的另一个使用方法是对多级索引的压缩，这在第四章和第五章的一些操作中是有用的：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [143]: </span><span class="n">df_temp</span> <span class="o">=</span> <span class="n">df_ex</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>

<span class="gp">In [144]: </span><span class="n">new_idx</span> <span class="o">=</span> <span class="n">df_temp</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">+</span><span class="s1">&#39;-&#39;</span><span class="o">+</span>
<span class="gp">   .....: </span>                                       <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">+</span><span class="s1">&#39;-&#39;</span><span class="o">+</span>
<span class="gp">   .....: </span>                                       <span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">]))</span>
<span class="gp">   .....: </span>

<span class="gp">In [145]: </span><span class="n">df_temp</span><span class="o">.</span><span class="n">index</span> <span class="o">=</span> <span class="n">new_idx</span>

<span class="gp">In [146]: </span><span class="n">df_temp</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c1"># 单层索引</span>
<span class="gh">Out[146]: </span>
<span class="go">Big         C               D            </span>
<span class="go">Small       c       d       c       d    </span>
<span class="go">Other     cat dog cat dog cat dog cat dog</span>
<span class="go">A-a-alpha   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">A-a-beta   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">A-b-alpha  -4   4  -1   0   7  -4   6   6</span>
<span class="go">A-b-beta   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B-a-alpha   0  -9   1  -6   2   9  -7  -9</span>
</pre></div>
</div>
<p>同时，也可以反向地展开：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [147]: </span><span class="n">new_idx</span> <span class="o">=</span> <span class="n">df_temp</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="nb">tuple</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">)))</span>

<span class="gp">In [148]: </span><span class="n">df_temp</span><span class="o">.</span><span class="n">index</span> <span class="o">=</span> <span class="n">new_idx</span>

<span class="gp">In [149]: </span><span class="n">df_temp</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c1"># 三层索引</span>
<span class="gh">Out[149]: </span>
<span class="go">Big         C               D            </span>
<span class="go">Small       c       d       c       d    </span>
<span class="go">Other     cat dog cat dog cat dog cat dog</span>
<span class="go">A a alpha   3   6  -9  -6  -6  -2   0   9</span>
<span class="go">    beta   -5  -3   3  -8  -3  -2   5   8</span>
<span class="go">  b alpha  -4   4  -1   0   7  -4   6   6</span>
<span class="go">    beta   -9   9  -6   8   5  -2  -9  -8</span>
<span class="go">B a alpha   0  -9   1  -6   2   9  -7  -9</span>
</pre></div>
</div>
</section>
<section id="id13">
<h3>3. 索引的设置与重置<a class="headerlink" href="#id13" title="Permalink to this heading">#</a></h3>
<p>为了说明本节的函数，下面构造一个新表：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [150]: </span><span class="n">df_new</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s1">&#39;A&#39;</span><span class="p">:</span><span class="nb">list</span><span class="p">(</span><span class="s1">&#39;aacd&#39;</span><span class="p">),</span>
<span class="gp">   .....: </span>                       <span class="s1">&#39;B&#39;</span><span class="p">:</span><span class="nb">list</span><span class="p">(</span><span class="s1">&#39;PQRT&#39;</span><span class="p">),</span>
<span class="gp">   .....: </span>                       <span class="s1">&#39;C&#39;</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]})</span>
<span class="gp">   .....: </span>

<span class="gp">In [151]: </span><span class="n">df_new</span>
<span class="gh">Out[151]: </span>
<span class="go">   A  B  C</span>
<span class="go">0  a  P  1</span>
<span class="go">1  a  Q  2</span>
<span class="go">2  c  R  3</span>
<span class="go">3  d  T  4</span>
</pre></div>
</div>
<p>索引的设置可以使用 <code class="docutils literal notranslate"><span class="pre">set_index</span></code> 完成，这里的主要参数是 <code class="docutils literal notranslate"><span class="pre">append</span></code> ，表示是否来保留原来的索引，直接把新设定的添加到原索引的内层：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [152]: </span><span class="n">df_new</span><span class="o">.</span><span class="n">set_index</span><span class="p">(</span><span class="s1">&#39;A&#39;</span><span class="p">)</span>
<span class="gh">Out[152]: </span>
<span class="go">   B  C</span>
<span class="go">A      </span>
<span class="go">a  P  1</span>
<span class="go">a  Q  2</span>
<span class="go">c  R  3</span>
<span class="go">d  T  4</span>

<span class="gp">In [153]: </span><span class="n">df_new</span><span class="o">.</span><span class="n">set_index</span><span class="p">(</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="n">append</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="gh">Out[153]: </span>
<span class="go">     B  C</span>
<span class="go">  A      </span>
<span class="go">0 a  P  1</span>
<span class="go">1 a  Q  2</span>
<span class="go">2 c  R  3</span>
<span class="go">3 d  T  4</span>
</pre></div>
</div>
<p>可以同时指定多个列作为索引：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [154]: </span><span class="n">df_new</span><span class="o">.</span><span class="n">set_index</span><span class="p">([</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="s1">&#39;B&#39;</span><span class="p">])</span>
<span class="gh">Out[154]: </span>
<span class="go">     C</span>
<span class="go">A B   </span>
<span class="go">a P  1</span>
<span class="go">  Q  2</span>
<span class="go">c R  3</span>
<span class="go">d T  4</span>
</pre></div>
</div>
<p>如果想要添加索引的列没有出现在其中，那么可以直接在参数中传入相应的 <code class="docutils literal notranslate"><span class="pre">Series</span></code> ：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [155]: </span><span class="n">my_index</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="s1">&#39;WXYZ&#39;</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;D&#39;</span><span class="p">)</span>

<span class="gp">In [156]: </span><span class="n">df_new</span> <span class="o">=</span> <span class="n">df_new</span><span class="o">.</span><span class="n">set_index</span><span class="p">([</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="n">my_index</span><span class="p">])</span>

<span class="gp">In [157]: </span><span class="n">df_new</span>
<span class="gh">Out[157]: </span>
<span class="go">     B  C</span>
<span class="go">A D      </span>
<span class="go">a W  P  1</span>
<span class="go">  X  Q  2</span>
<span class="go">c Y  R  3</span>
<span class="go">d Z  T  4</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">reset_index</span></code> 是 <code class="docutils literal notranslate"><span class="pre">set_index</span></code> 的逆函数，其主要参数是 <code class="docutils literal notranslate"><span class="pre">drop</span></code> ，表示是否要把去掉的索引层丢弃，而不是添加到列中：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [158]: </span><span class="n">df_new</span><span class="o">.</span><span class="n">reset_index</span><span class="p">([</span><span class="s1">&#39;D&#39;</span><span class="p">])</span>
<span class="gh">Out[158]: </span>
<span class="go">   D  B  C</span>
<span class="go">A         </span>
<span class="go">a  W  P  1</span>
<span class="go">a  X  Q  2</span>
<span class="go">c  Y  R  3</span>
<span class="go">d  Z  T  4</span>

<span class="gp">In [159]: </span><span class="n">df_new</span><span class="o">.</span><span class="n">reset_index</span><span class="p">([</span><span class="s1">&#39;D&#39;</span><span class="p">],</span> <span class="n">drop</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="gh">Out[159]: </span>
<span class="go">   B  C</span>
<span class="go">A      </span>
<span class="go">a  P  1</span>
<span class="go">a  Q  2</span>
<span class="go">c  R  3</span>
<span class="go">d  T  4</span>
</pre></div>
</div>
<p>如果重置了所有的索引，那么 <code class="docutils literal notranslate"><span class="pre">pandas</span></code> 会直接重新生成一个默认索引：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [160]: </span><span class="n">df_new</span><span class="o">.</span><span class="n">reset_index</span><span class="p">()</span>
<span class="gh">Out[160]: </span>
<span class="go">   A  D  B  C</span>
<span class="go">0  a  W  P  1</span>
<span class="go">1  a  X  Q  2</span>
<span class="go">2  c  Y  R  3</span>
<span class="go">3  d  Z  T  4</span>
</pre></div>
</div>
</section>
<section id="id14">
<h3>4. 索引的变形<a class="headerlink" href="#id14" title="Permalink to this heading">#</a></h3>
<p>在某些场合下，需要对索引做一些扩充或者剔除，更具体地要求是给定一个新的索引，把原表中相应的索引对应元素填充到新索引构成的表中。例如，下面的表中给出了员工信息，需要重新制作一张新的表，要求增加一名员工的同时去掉身高列并增加性别列：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [161]: </span><span class="n">df_reindex</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">&quot;Weight&quot;</span><span class="p">:[</span><span class="mi">60</span><span class="p">,</span><span class="mi">70</span><span class="p">,</span><span class="mi">80</span><span class="p">],</span>
<span class="gp">   .....: </span>                           <span class="s2">&quot;Height&quot;</span><span class="p">:[</span><span class="mi">176</span><span class="p">,</span><span class="mi">180</span><span class="p">,</span><span class="mi">179</span><span class="p">]},</span>
<span class="gp">   .....: </span>                           <span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;1001&#39;</span><span class="p">,</span><span class="s1">&#39;1003&#39;</span><span class="p">,</span><span class="s1">&#39;1002&#39;</span><span class="p">])</span>
<span class="gp">   .....: </span>

<span class="gp">In [162]: </span><span class="n">df_reindex</span>
<span class="gh">Out[162]: </span>
<span class="go">      Weight  Height</span>
<span class="go">1001      60     176</span>
<span class="go">1003      70     180</span>
<span class="go">1002      80     179</span>

<span class="gp">In [163]: </span><span class="n">df_reindex</span><span class="o">.</span><span class="n">reindex</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;1001&#39;</span><span class="p">,</span><span class="s1">&#39;1002&#39;</span><span class="p">,</span><span class="s1">&#39;1003&#39;</span><span class="p">,</span><span class="s1">&#39;1004&#39;</span><span class="p">],</span>
<span class="gp">   .....: </span>                   <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;Weight&#39;</span><span class="p">,</span><span class="s1">&#39;Gender&#39;</span><span class="p">])</span>
<span class="gp">   .....: </span>
<span class="gh">Out[163]: </span>
<span class="go">      Weight  Gender</span>
<span class="go">1001    60.0     NaN</span>
<span class="go">1002    80.0     NaN</span>
<span class="go">1003    70.0     NaN</span>
<span class="go">1004     NaN     NaN</span>
</pre></div>
</div>
<p>这种需求常出现在时间序列索引的时间点填充以及 <code class="docutils literal notranslate"><span class="pre">ID</span></code> 编号的扩充。另外，需要注意的是原来表中的数据和新表中会根据索引自动对齐，例如原先的1002号位置在1003号之后，而新表中相反，那么 <code class="docutils literal notranslate"><span class="pre">reindex</span></code> 中会根据元素对齐，与位置无关。</p>
<p>还有一个与 <code class="docutils literal notranslate"><span class="pre">reindex</span></code> 功能类似的函数是 <code class="docutils literal notranslate"><span class="pre">reindex_like</span></code> ，其功能是仿照传入的表索引来进行被调用表索引的变形。例如，现在已经存在一张表具备了目标索引的条件，那么上述功能可采用下述代码得到：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [164]: </span><span class="n">df_existed</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;1001&#39;</span><span class="p">,</span><span class="s1">&#39;1002&#39;</span><span class="p">,</span><span class="s1">&#39;1003&#39;</span><span class="p">,</span><span class="s1">&#39;1004&#39;</span><span class="p">],</span>
<span class="gp">   .....: </span>                          <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;Weight&#39;</span><span class="p">,</span><span class="s1">&#39;Gender&#39;</span><span class="p">])</span>
<span class="gp">   .....: </span>

<span class="gp">In [165]: </span><span class="n">df_reindex</span><span class="o">.</span><span class="n">reindex_like</span><span class="p">(</span><span class="n">df_existed</span><span class="p">)</span>
<span class="gh">Out[165]: </span>
<span class="go">      Weight  Gender</span>
<span class="go">1001    60.0     NaN</span>
<span class="go">1002    80.0     NaN</span>
<span class="go">1003    70.0     NaN</span>
<span class="go">1004     NaN     NaN</span>
</pre></div>
</div>
</section>
</section>
<section id="id15">
<h2>四、索引运算<a class="headerlink" href="#id15" title="Permalink to this heading">#</a></h2>
<section id="id16">
<h3>1. 集合的运算法则<a class="headerlink" href="#id16" title="Permalink to this heading">#</a></h3>
<p>经常会有一种利用集合运算来取出符合条件行的需求，例如有两张表 <code class="docutils literal notranslate"><span class="pre">A</span></code> 和 <code class="docutils literal notranslate"><span class="pre">B</span></code> ，它们的索引都是员工编号，现在需要筛选出两表索引交集的所有员工信息，此时通过 <code class="docutils literal notranslate"><span class="pre">Index</span></code> 上的运算操作就很容易实现。</p>
<p>不过在此之前，不妨先复习一下常见的四种集合运算：</p>
<div class="math notranslate nohighlight">
\[\begin{split}\rm
S_A.intersection(S_B) &amp;= \rm S_A \cap S_B \Leftrightarrow \rm \{x|x\in S_A\, and\, x\in S_B\}\\
\rm
S_A.union(S_B) &amp;= \rm S_A \cup S_B \Leftrightarrow \rm \{x|x\in S_A\, or\, x\in S_B\}\\
\rm
S_A.difference(S_B) &amp;= \rm S_A - S_B \Leftrightarrow \rm \{x|x\in S_A\, and\, x\notin S_B\}\\
\rm
S_A.symmetric\_difference(S_B) &amp;= \rm S_A\triangle S_B\Leftrightarrow \rm \{x|x\in S_A\cup S_B - S_A\cap S_B\}\end{split}\]</div>
</section>
<section id="id17">
<h3>2. 一般的索引运算<a class="headerlink" href="#id17" title="Permalink to this heading">#</a></h3>
<p>由于集合的元素是互异的，但是索引中可能有相同的元素，先用 <code class="docutils literal notranslate"><span class="pre">unique</span></code> 去重后再进行运算。下面构造两张最为简单的示例表进行演示：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [166]: </span><span class="n">df_set_1</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]],</span>
<span class="gp">   .....: </span>                        <span class="n">index</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Index</span><span class="p">([</span><span class="s1">&#39;a&#39;</span><span class="p">,</span><span class="s1">&#39;b&#39;</span><span class="p">,</span><span class="s1">&#39;a&#39;</span><span class="p">],</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;id1&#39;</span><span class="p">))</span>
<span class="gp">   .....: </span>

<span class="gp">In [167]: </span><span class="n">df_set_2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([[</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">],[</span><span class="mi">2</span><span class="p">,</span><span class="mi">6</span><span class="p">],[</span><span class="mi">7</span><span class="p">,</span><span class="mi">1</span><span class="p">]],</span>
<span class="gp">   .....: </span>                        <span class="n">index</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Index</span><span class="p">([</span><span class="s1">&#39;b&#39;</span><span class="p">,</span><span class="s1">&#39;b&#39;</span><span class="p">,</span><span class="s1">&#39;c&#39;</span><span class="p">],</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;id2&#39;</span><span class="p">))</span>
<span class="gp">   .....: </span>

<span class="gp">In [168]: </span><span class="n">id1</span><span class="p">,</span> <span class="n">id2</span> <span class="o">=</span> <span class="n">df_set_1</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">unique</span><span class="p">(),</span> <span class="n">df_set_2</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">unique</span><span class="p">()</span>

<span class="gp">In [169]: </span><span class="n">id1</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="n">id2</span><span class="p">)</span>
<span class="gh">Out[169]: </span><span class="go">Index([&#39;b&#39;], dtype=&#39;object&#39;)</span>

<span class="gp">In [170]: </span><span class="n">id1</span><span class="o">.</span><span class="n">union</span><span class="p">(</span><span class="n">id2</span><span class="p">)</span>
<span class="gh">Out[170]: </span><span class="go">Index([&#39;a&#39;, &#39;b&#39;, &#39;c&#39;], dtype=&#39;object&#39;)</span>

<span class="gp">In [171]: </span><span class="n">id1</span><span class="o">.</span><span class="n">difference</span><span class="p">(</span><span class="n">id2</span><span class="p">)</span>
<span class="gh">Out[171]: </span><span class="go">Index([&#39;a&#39;], dtype=&#39;object&#39;)</span>

<span class="gp">In [172]: </span><span class="n">id1</span><span class="o">.</span><span class="n">symmetric_difference</span><span class="p">(</span><span class="n">id2</span><span class="p">)</span>
<span class="gh">Out[172]: </span><span class="go">Index([&#39;a&#39;, &#39;c&#39;], dtype=&#39;object&#39;)</span>
</pre></div>
</div>
<p>若两张表需要做集合运算的列并没有被设置索引，一种办法是先转成索引，运算后再恢复，另一种方法是利用 <code class="docutils literal notranslate"><span class="pre">isin</span></code> 函数，例如在重置索引的第一张表中选出id列交集的所在行：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [173]: </span><span class="n">df_set_in_col_1</span> <span class="o">=</span> <span class="n">df_set_1</span><span class="o">.</span><span class="n">reset_index</span><span class="p">()</span>

<span class="gp">In [174]: </span><span class="n">df_set_in_col_2</span> <span class="o">=</span> <span class="n">df_set_2</span><span class="o">.</span><span class="n">reset_index</span><span class="p">()</span>

<span class="gp">In [175]: </span><span class="n">df_set_in_col_1</span>
<span class="gh">Out[175]: </span>
<span class="go">  id1  0  1</span>
<span class="go">0   a  0  1</span>
<span class="go">1   b  1  2</span>
<span class="go">2   a  3  4</span>

<span class="gp">In [176]: </span><span class="n">df_set_in_col_2</span>
<span class="gh">Out[176]: </span>
<span class="go">  id2  0  1</span>
<span class="go">0   b  4  5</span>
<span class="go">1   b  2  6</span>
<span class="go">2   c  7  1</span>

<span class="gp">In [177]: </span><span class="n">df_set_in_col_1</span><span class="p">[</span><span class="n">df_set_in_col_1</span><span class="o">.</span><span class="n">id1</span><span class="o">.</span><span class="n">isin</span><span class="p">(</span><span class="n">df_set_in_col_2</span><span class="o">.</span><span class="n">id2</span><span class="p">)]</span>
<span class="gh">Out[177]: </span>
<span class="go">  id1  0  1</span>
<span class="go">1   b  1  2</span>
</pre></div>
</div>
</section>
</section>
<section id="id18">
<h2>五、练习<a class="headerlink" href="#id18" title="Permalink to this heading">#</a></h2>
<section id="ex1">
<h3>Ex1：公司员工数据集<a class="headerlink" href="#ex1" title="Permalink to this heading">#</a></h3>
<p>现有一份公司员工数据集：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [178]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">&#39;data/company.csv&#39;</span><span class="p">)</span>

<span class="gp">In [179]: </span><span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="gh">Out[179]: </span>
<span class="go">   EmployeeID birthdate_key  age  city_name department      job_title gender</span>
<span class="go">0        1318      1/3/1954   61  Vancouver  Executive            CEO      M</span>
<span class="go">1        1319      1/3/1957   58  Vancouver  Executive      VP Stores      F</span>
<span class="go">2        1320      1/2/1955   60  Vancouver  Executive  Legal Counsel      F</span>
</pre></div>
</div>
<ol class="arabic simple">
<li><p>分别只使用 <code class="docutils literal notranslate"><span class="pre">query</span></code> 和 <code class="docutils literal notranslate"><span class="pre">loc</span></code> 选出年龄不超过四十岁且工作部门为 <code class="docutils literal notranslate"><span class="pre">Dairy</span></code> 或 <code class="docutils literal notranslate"><span class="pre">Bakery</span></code> 的男性。</p></li>
<li><p>选出员工 <code class="docutils literal notranslate"><span class="pre">ID</span></code> 号 为奇数所在行的第1、第3和倒数第2列。</p></li>
<li><p>按照以下步骤进行索引操作：</p></li>
</ol>
<ul class="simple">
<li><p>把后三列设为索引后交换内外两层</p></li>
<li><p>恢复中间层索引</p></li>
<li><p>修改外层索引名为 <code class="docutils literal notranslate"><span class="pre">Gender</span></code></p></li>
<li><p>用下划线合并两层行索引</p></li>
<li><p>把行索引拆分为原状态</p></li>
<li><p>修改索引名为原表名称</p></li>
<li><p>恢复默认索引并将列保持为原表的相对位置</p></li>
</ul>
</section>
<section id="ex2">
<h3>Ex2：巧克力数据集<a class="headerlink" href="#ex2" title="Permalink to this heading">#</a></h3>
<p>现有一份关于巧克力评价的数据集：</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [180]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">&#39;data/chocolate.csv&#39;</span><span class="p">)</span>

<span class="gp">In [181]: </span><span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="gh">Out[181]: </span>
<span class="go">    Company  Review\r\nDate Cocoa\r\nPercent Company\r\nLocation  Rating</span>
<span class="go">0  A. Morin            2016              63%              France    3.75</span>
<span class="go">1  A. Morin            2015              70%              France    2.75</span>
<span class="go">2  A. Morin            2015              70%              France    3.00</span>
</pre></div>
</div>
<ol class="arabic simple">
<li><p>把列索引名中的 <code class="docutils literal notranslate"><span class="pre">\n</span></code> 替换为空格。</p></li>
<li><p>巧克力 <code class="docutils literal notranslate"><span class="pre">Rating</span></code> 评分为1至5，每0.25分一档，请选出2.75分及以下且可可含量 <code class="docutils literal notranslate"><span class="pre">Cocoa</span> <span class="pre">Percent</span></code> 高于中位数的样本。</p></li>
<li><p>将 <code class="docutils literal notranslate"><span class="pre">Review</span> <span class="pre">Date</span></code> 和 <code class="docutils literal notranslate"><span class="pre">Company</span> <span class="pre">Location</span></code> 设为索引后，选出 <code class="docutils literal notranslate"><span class="pre">Review</span> <span class="pre">Date</span></code> 在2012年之后且 <code class="docutils literal notranslate"><span class="pre">Company</span> <span class="pre">Location</span></code> 不属于 <code class="docutils literal notranslate"><span class="pre">France,</span> <span class="pre">Canada,</span> <span class="pre">Amsterdam,</span> <span class="pre">Belgium</span></code> 的样本。</p></li>
</ol>
</section>
</section>
</section>


              </article>
              

              
          </div>
          
      </div>
    </div>

  
  
  <!-- Scripts loaded after <body> so the DOM is not blocked -->
  <script src="../_static/scripts/pydata-sphinx-theme.js?digest=92025949c220c2e29695"></script>

<footer class="bd-footer"><div class="bd-footer__inner container">
  
  <div class="footer-item">
    <p class="copyright">
    &copy; Copyright 2020-2022, Datawhale, 耿远昊.<br>
</p>
  </div>
  
  <div class="footer-item">
    <p class="sphinx-version">
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 5.0.2.<br>
</p>
  </div>
  
</div>
</footer>
  </body>
</html>